Quick Tip: Replacing non-UTF-8 Characters

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into…

OpenText  profile picture
OpenText

July 16, 20212 min read

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into Vertica that is non UTF-8, but you’ll want to clean it up.

I used to recommend the REGEXP_REPLACE function for that task, but now there is a better way!

Vertica 10.1.x introduces the MAKEUTF8 built-in function that coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.

The old way of removing non-UTF-8 characters:


dbadmin=> \d nonutf_test;
                                       List of Fields by Tables
 Schema |    Table    | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------------+--------+--------------+------+---------+----------+-------------+-------------
 public | nonutf_test | c      | varchar(100) |  100 |         | f        | f           |
(1 row)

dbadmin=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT c, REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b') FROM nonutf_test ;
    c    | REGEXP_REPLACE
---------+----------------
 ▒Test!▒ | Test!
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
    c
---------
 ▒Test!▒
(1 row)

dbadmin=> UPDATE /*+ DIRECT */ nonutf_test SET c = REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b');
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Now the new and improved way:


verticademos=> TRUNCATE TABLE nonutf_test;
TRUNCATE TABLE

verticademos=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT c, makeutf8(c) FROM nonutf_test;
    c    | makeutf8
---------+----------
 ▒Test!▒ | Test!
(1 row)

verticademos=> SELECT c, makeutf8(c) "Removed non-UTF8 Junk!" FROM nonutf_test;
    c    | Removed non-UTF8 Junk!
---------+------------------------
 ▒Test!▒ | Test!
(1 row)

verticademos=> UPDATE nonutf_test SET c = makeutf8(c);
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Share this post

Share this post to x. Share to linkedin. Mail to
OpenText avatar image

OpenText

OpenText, The Information Company, enables organizations to gain insight through market-leading information management solutions, powered by OpenText Cloud Editions.

See all posts

More from the author

Elevate together at OpenText Summit London 2026

Elevate together at OpenText Summit London 2026

Take control of your AI journey

April 06, 2026

3 min read

OpenText Summit Paris 2026

OpenText Summit Paris 2026

Join us this April in the City of Light

April 01, 2026

3 min read

OpenText Summit Munich 2026

OpenText Summit Munich 2026

One city, two days, endless innovation

March 23, 2026

3 min read

Stay in the loop!

Receive regular insights, updates, and resources—right in your inbox.