Quick Tip: Replacing non-UTF-8 Characters

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into…

OpenText  profile picture
OpenText

July 16, 20212 min read

Vertica database servers expect to receive all data in UTF-8 and Vertica outputs all data in UTF-8. However, you can load or insert data into Vertica that is non UTF-8, but you’ll want to clean it up.

I used to recommend the REGEXP_REPLACE function for that task, but now there is a better way!

Vertica 10.1.x introduces the MAKEUTF8 built-in function that coerces a string to UTF-8 by removing or replacing non-UTF-8 characters.

The old way of removing non-UTF-8 characters:


dbadmin=> \d nonutf_test;
                                       List of Fields by Tables
 Schema |    Table    | Column |     Type     | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------------+--------+--------------+------+---------+----------+-------------+-------------
 public | nonutf_test | c      | varchar(100) |  100 |         | f        | f           |
(1 row)

dbadmin=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT c, REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b') FROM nonutf_test ;
    c    | REGEXP_REPLACE
---------+----------------
 ▒Test!▒ | Test!
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
    c
---------
 ▒Test!▒
(1 row)

dbadmin=> UPDATE /*+ DIRECT */ nonutf_test SET c = REGEXP_REPLACE(c, '[^\t\r\n\x20-\x7E]+', '', 1, 0, 'b');
 OUTPUT
--------
      1
(1 row)

dbadmin=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Now the new and improved way:


verticademos=> TRUNCATE TABLE nonutf_test;
TRUNCATE TABLE

verticademos=> INSERT INTO nonutf_test SELECT E'\xa0' || 'Test!' || E'\xa0';
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT c, makeutf8(c) FROM nonutf_test;
    c    | makeutf8
---------+----------
 ▒Test!▒ | Test!
(1 row)

verticademos=> SELECT c, makeutf8(c) "Removed non-UTF8 Junk!" FROM nonutf_test;
    c    | Removed non-UTF8 Junk!
---------+------------------------
 ▒Test!▒ | Test!
(1 row)

verticademos=> UPDATE nonutf_test SET c = makeutf8(c);
 OUTPUT
--------
      1
(1 row)

verticademos=> SELECT * FROM nonutf_test;
   c
-------
 Test!
(1 row)

Share this post

Share this post to x. Share to linkedin. Mail to
OpenText avatar image

OpenText

OpenText, The Information Company, enables organizations to gain insight through market-leading information management solutions, powered by OpenText Cloud Editions.

See all posts

More from the author

OpenText World 2025: Elevate together

OpenText World 2025: Elevate together

Register today!

September 11, 2025

2 min read

Unlocking AI’s full potential

Unlocking AI’s full potential

Recent survey reveals mature practices and a solid foundation are key

September 08, 2025

2 min read

Top 5 reasons you can’t miss OpenText World 2025

Top 5 reasons you can’t miss OpenText World 2025

Join us in Nashville this November

September 05, 2025

4 min read

Stay in the loop!

Get our most popular content delivered monthly to your inbox.