Quick Tip: Replacing non-UTF-8 Characters

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into…

OpenText  profile picture
OpenText

July 16, 20212 min read

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into Vertica that is non UTF-8, but you’ll want to clean it up.

I used to recommend the REGEXP_REPLACE function for that task, but now there is a better way!

Vertica 10.1.x introduces the MAKEUTF8 built-in function that coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.

The old way of removing non-UTF-8 characters:


dbadmin=> \d nonutf_test;
                                       List of Fields by Tables
 Schema |    Table    | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------------+--------+--------------+------+---------+----------+-------------+-------------
 public | nonutf_test | c      | varchar(100) |  100 |         | f        | f           |
(1 row)

dbadmin=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT c, REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b') FROM nonutf_test ;
    c    | REGEXP_REPLACE
---------+----------------
 ▒Test!▒ | Test!
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
    c
---------
 ▒Test!▒
(1 row)

dbadmin=> UPDATE /*+ DIRECT */ nonutf_test SET c = REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b');
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Now the new and improved way:


verticademos=> TRUNCATE TABLE nonutf_test;
TRUNCATE TABLE

verticademos=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT c, makeutf8(c) FROM nonutf_test;
    c    | makeutf8
---------+----------
 ▒Test!▒ | Test!
(1 row)

verticademos=> SELECT c, makeutf8(c) "Removed non-UTF8 Junk!" FROM nonutf_test;
    c    | Removed non-UTF8 Junk!
---------+------------------------
 ▒Test!▒ | Test!
(1 row)

verticademos=> UPDATE nonutf_test SET c = makeutf8(c);
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Share this post

Share this post to x. Share to linkedin. Mail to
OpenText avatar image

OpenText

OpenText, The Information Company, enables organizations to gain insight through market-leading information management solutions, powered by OpenText Cloud Editions.

See all posts

More from the author

Step right up! OpenText World 2025

Step right up! OpenText World 2025

Where the OpenText community comes together

October 22, 2025

3 min read

OpenText World 2025: What’s in it for you?

OpenText World 2025: What’s in it for you?

Find inspiration and innovation at the premier AI conference

October 16, 2025

4 min read

OpenText World 2025: Learn, connect, and grow

OpenText World 2025: Learn, connect, and grow

Hear from 20+ customer speakers; choose from 150+ sessions

October 09, 2025

2 min read

Stay in the loop!

Get our most popular content delivered monthly to your inbox.