Page 231 – OpenText Blogs

Give the people what they want: Better financial product recommendations

One of the big advantages of doing business in a digital world is being able to make use of the data exhaust left by practically every action and interaction—from what a person buys on Amazon to what newspapers or magazines they read, who they’re Facebook-chatting with and what time they drove across the Golden Gate Bridge, as tracked by their RFID automated toll tag.

Theoretically, a smart business could assemble all those pieces of information into detailed, individualized pictures of each customer and make predictions about their preferences and behavior.

By doing so, that business could figure out “buying propensity”: the customer’s level of readiness to purchase something at the moment, as opposed to what the business finds most profitable to sell or is most eager to unload.

You might think it would be easy to figure out, given the zettabytes of data pouring in, all kinds of clever targeting schemes to pick from, and nearly a century’s worth of research into demographics, psycho-graphics, behavioral economics and other approaches meant to get into the consumer’s head.

The problem is, without intelligence, this data-rich approach doesn’t always work.

Many offers are not just irrelevant to consumers, they may actually strike the wrong note. For example, romantic getaways for someone who just got divorced, or retirement community units to someone under 30. Or, the sheer number of offers could prompt the annoyed customer to delete them all, unread.

Customizing financial services for a mass audience

This challenge of understanding buying propensity is particularly acute for banks, investment firms, and similar financial organizations. They’re no longer just the province of wealth management, where a courtly, silver-haired gentleman would sit down at a walnut desk in a stately marble office with each client to discuss their individual needs and fiscal health. Today, an ever-growing proportion of the population has savings and investments, such as a retirement account, and could benefit from guidance on how to manage their money. But not everyone can afford a private financial advisor.

Moreover, banks have their hands full. They’ve inserted themselves into so many lines of business, from home mortgages and credit-card protection to auto-paying utility bills and even retail shopping, that they have literally millions of potential combinations of offers, customers, times and channels to market—desktop, mobile, kiosk, partner site and more.

In that deluge of data, it can be hard to decide which product offer to present to customers, and when to offer them. Some financial providers, frustrated by the difficulty of building an effective propensity model, default to a simple tactic that arbitrarily rotates offers at each new visit. But that’s leaving money on the table—and downgrading the customer experience.

AI: When the machine learns from its mistakes

Now there’s a smarter alternative, thanks to breakthroughs in machine learning—artificial intelligence-augmented product recommendation software. Not only is this kind of tool constantly sizing up consumers’ readiness to buy and displaying offers it thinks will interest them, it “learns” from its failures and successes.

With AI-augmented financial recommendation intelligence, financial companies can mine not just a consumer’s past online activity, credit score and demographic profile, but behavior patterns of similar customers, retail partners’ purchase histories—even the unstructured data of a customer’s social media posts or comments they’ve made in customer support chats.

It can keep score of the myriad of permutations of offers—headline, offer wording, size of the discount, the particular products displayed—like the world’s biggest A/B test. (Even abandoned orders can be a rich source of insight. They show that something attracted the customer initially, but then something else put them off, which the AI can tease out.) With every interaction the software observes, it gets smarter.

AI-guided recommendations = ROI

Here are some of the benefits a financial institution can realize from AI-enhanced product recommendations:
• Higher “hit rate” on offers leading to stronger sales
• Higher profits
• Greater customer satisfaction
• Better insight into larger customer preferences and buying trends to inform future product development
• Lower administrative burden to evaluate and fine-tune offer models
• Fewer complaints to customer service about offerings that miss the mark
• Less customer churn
• Improved competitive position

Curious how AI-augmented financial product recommendations could work for you? Read this infographic, or click here to learn more about how intelligent recommendations could help your business.

OpenText

June 27, 2018

Analytics, Aviator AI, Banking, Industries, Technologies

AI, Analytics, artificial intelligence, Banking, Customer Experience, financial services, Industry, unstructured data
Why the US Government eInvoicing mandate is positive for business

Last year, the Office of Management and Budget (OMB) backed up its 2015 government eInvoicing mandate with a memo instructing all agencies to use eInvoicing to pay contractors by the end of fiscal year 2018. That deadline is rapidly approaching. If you supply the US federal government and want to continue to do so, you need to issue invoices electronically. This blog looks at what this means for your business and how you can benefit from eInvoicing.

The US government is the largest purchaser of goods and services in the US. In 2016, for instance, this added up to $461 billion in federal government procurement contracts. The much-quoted figure is that the US government handles 19 million invoices each year, of which only 40% are electronic. The OMB estimates that moving to eInvoicing will save the government as much as $260 million each year.

On issuing the memo, Dave Mader, Acting Deputy Director for Management at the OMB said that non-electronic invoice processes “provide little visibility to businesses and can result in tax dollars being used for late payment fees rather than to support critical agency missions”.

It’s clear that, although implementation timelines may slip, the US government is committed to making eInvoicing compulsory for Business-to-Government (B2G) invoicing. It also sees this as a springboard to encourage the adoption of Business-to-Business (B2B) eInvoicing – with a view, you must expect, to mandating its use in the future.

The benefits of eInvoicing

While steady, the adoption of eInvoicing in the US has been slow – especially compared with other areas of the world. This is slightly surprising given that the benefits are clear and understood. Compared with paper-based payment processes, eInvoicing is faster, more accurate and far more cost effective. In fact, Billentis, the specialist eInvoicing consultancy, found that electronic invoice processes can result in savings of 60 to 80% compared to traditional paper-based processing. The firm noted that: “Thanks to electronic and automated invoice processing, savings between 1 and 2% of turnover are realistic objectives”.

In addition, eInvoicing allows for end-to-end processing of invoices between the supplier’s ERP or enterprise application in the accounts payables and automatically flow in to the ERP or business software of the buyer’s accounts receivables. This is a point emphasized by the Federal Reserve Bank of Minneapolis that stated: “E-Invoicing is an important part of an efficient financial supply chain, optimizing the end-to-end process of B2B transactions, as it links the internal processes of enterprises to payment systems. As a result, eInvoicing is a vital component of the overall goal of making the end-to-end process (procure-to-pay and order-to-cash) more efficient”.

So, while meeting the government mandate may initially seem a hassle, it undoubtedly opens up a huge range of benefits to organizations to fully adopt the technology.

As the US is slightly behind the curve on eInvoicing, it is worth very briefly looking at how it’s working in other regions.

Lessons from the EU

The European Union (EU) has always been seen as a leader in eInvoicing with 30 countries across Europe already using some form of electronic invoicing. Indeed, the EU government eInvoicing mandate requires that member countries define rules for electronic invoices in public procurement processes by November 2018.

From an US perspective, another date in the EU’s adoption of eInvoicing may be more significant – 18 April 2019. That’s the date set for the EU to introduce an interoperable, common standard for all B2G trade in Europe. This eInvoicing standard has been designed to counteract the complexity of the wide variety of eInvoicing formats throughout Europe and facilitate cross-border trade in the EU.

The US faces a very similar eInvoicing landscape where every agency could institute it’s own eInvoicing approach and this would be further complicated as state and local governments make the transition. The initial eInvoicing mandate certainly gave scope for this to happen. However, the OMB is following the EU in attempting to deliver a simplified, interoperable model. It is introducing a model where companies work through a Federal Shared Service Provider (FSSP) or an OMB-approved eInvoicing solution – both have to deliver seamless integration into the federal government’s Integrated Award Environment (IAE) system.

While this is a positive step, a quick look at another region in the vanguard of eInvoicing introduces a note of caution.

Lessons from Latin America

Most Latin American government eInvoicing mandates are a means to tackle tax fraud. The results have been very impressive. The UN estimated in 2015 that VAT evasion alone costs Latin American governments as much as 37% of their total VAT take each year. With eInvoicing, Brazil collected $58 billion in additional tax revenue in a single year, and Mexico saw a 34% VAT collection increase in a single quarter. It’s little surprise that eInvoicing has been so enthusiastically adopted in the region. However, although the eInvoicing systems are similar, there are significant differences between countries that mean, for organizations trading across Latin America, complying with a wide range of eInvoicing formats and requirements.

More importantly, it’s instructive to see how eInvoicing has developed across Latin America. After the initial success with VAT fraud, countries in the region have moved quickly to extend the use of electronic document transfer into other business areas. Brazil, for example, has expanded it’s B2G mandates into accounting, inventory management and human resources. In addition countries including Mexico, Peru, Chile and Uruguay have extended their VAT and tax legislation.

However, it is government eInvoicing mandates for B2B transactions that have made Latin American the world leader in the adoption in the technology. It is no wild extension of logic to say that the speedy transition from B2G to B2B transactions that has happened in Latin America will occur in the US – it’s already started in the EU with countries such as Italy mandating eInvoicing for B2B transactions.

From B2G to B2B

We know it’s no wild extension because the US government explicitly states that this is what they want to happen. The Federal Reserve says: “The argument is, if a private business supplier implements eInvoicing to address government requirements, they may well extend this implementation to other businesses they supply”.

It is becoming more and more evident that governments want to see private business adopt eInvoicing with their business partners. And, they’d like for this to extend from invoicing into other areas of your business. We’re currently seeing the carrot – hey, you’re already using electronic invoicing for B2G, so why not B2B? – but, if the development of B2G is repeated, you can be quite sure that the stick – full mandate – won’t be too far behind.

The eInvoicing challenge for global enterprises

The eInvoicing challenge for global enterprises can be summed up by the extremely wide variety of eInvoicing formats and standards at play throughout the world. Even where real steps are being made to simplify and standardize – like the EU – history teaches that the eventual standard will be applied differently within different countries.

This environment means that every global organization will need to be able to support a multi-format, multi-channel environment to practice effective eInvoicing. Add to this that eInvoicing standards are constantly changing and evolving, and you’re faced with a complexity that almost no company will be able to manage by itself. Working with a global eInvoicing service provider – such as OpenText™ – is likely to be far more attractive to ensure compliant eInvoicing processes in each country or region where you operate, as well as delivering the potential to quickly enter new territories as business requires.

Learn more about eInvoicing.

OpenText

June 26, 2018

Integration, Public Sector, Supply Chain, Technologies

B2B Integration, Business Network, e-Invoicing
Why Italy’s eInvoicing mandate could impact you

In a previous blog, I looked at how governments are now driving eInvoicing adoption around the world. Now, Italy is set to be the first country in Europe to mandate eInvoicing for all B2G and B2B transactions. Regardless of whether or not you do business in Italy, this is a major development that will have implications for organizations everywhere. In this blog, I’ll explain why.

January 2019 is the date the Italian government has set for all companies to use eInvoicing for all B2B transactions for the supply of goods and taxable services. To achieve this, Italy is moving to real-time tax reporting – often called the ‘clearance model’ – where all invoices are submitted to the tax authority prior to their issue to the supplier.

By ensuring that invoices can’t be sent directly between buyer and seller, the clearance model allows the government or tax authority to control and authorize the transaction before and during the invoice exchange process. The advantages for government are clear: they can dramatically reduce tax fraud and increase revenue collection.

Italy’s move sees the government attempting to emulate the successful clearance regimes in Latin America – especially Mexico and Brazil. It will be the first country in Europe to fully embrace this approach.

Although Europe has been at the forefront of eInvoicing for many years – with eInvoicing becoming mandatory for B2G transactions in 2018 – it has adopted the ‘post-audit’ model where the buyer and seller exchange eInvoices and the tax authority conducts an audit after the transaction is completed. This model is vulnerable to fraud and corruption, leading to what has become known as the ‘VAT gap’.

Closing the VAT gap

The VAT gap is the gap between what a tax authority is owed and what it actually collects. And that gap is huge. In Europe, this gap is estimated to account of between $185-$310 billion each year. It’s thought that the global VAT gap may be as much as 30 percent of total global revenue.

Italy has one of the largest VAT gaps in Europe, standing at around $42 billion in 2015. A quick glance at how other governments around the world were gaining visibility and transparency into invoice data would have convinced the Italian authorities of the benefits of this move. Mexico is an excellent example.

Mexico: The exemplar of the clearance model

Mexico is seen as a pioneer of eInvoicing and the clearance model – and with good reason. The immediate effects that the country has seen from moving to real time tax reporting is stunning. Today, the government handles over 10 billion eInvoices annually and has increased tax collection by 34%. The country reports that it has reduced the shadow economy by 4% and closed its VAT gap by an amount that is, astonishingly, equivalent to 21% of its annual budget.

The clearance model makes this possible as it places the tax authority – or its agents – at the heart of the transaction. The government is now in a position to levy and enforce stringent fines, loss of trading privileges and even criminal prosecutions. Mexico, for example, is able to exclude suppliers from raising invoices. Effectively, they’re out of business until they comply.

But, Mexico doesn’t just exemplify the benefits to the government, it also shines a light on how eInvoicing can transform business for private organizations. Mexico and Brazil have begun to work on ways to facilitate friction-less, cross-border trade. The concept is that, if the clearance systems in each country can talk to each other, the private companies attached to these systems will not need any customs documentation for these transaction.

Mexico and Brazil have also begun a pilot to extend this approach to cross-border trade with Spain and Portugal.

The rise and rise of the clearance model

The announcement of the Italian government is only another step along the road towards worldwide adoption of the clearance model. Global eInvoicing compliance experts TrustWeaver suggest that, by 2025, it will be the norm for tax authorities across the globe. This is a trend of which all organizations have to be aware. Increasingly, you will have to be able to deliver compliant eInvoicing in all of your markets worldwide.

That’s hugely complex as there is no such thing as a single clearance model. As TrustWeaver explains: ‘Clearance models continue to spread across the globe, with countries learning from each other but making local decisions, resulting in clearance flavors that add compliance complexity, especially for multi-national companies. But there is one common crucial enabler – confidence and reliance on electronic signatures as the only feasible mechanism for ensuring integrity and authenticity for real-time tax controls’.

The pivotal role of the eInvoicing service provider

The complexity of the clearance model means that it has always been envisaged that organizations would work with service providers to ensure the effective operation of the process. However, selecting the correct service provider is even more important today. More many countries, pure eInvoicing services are no longer enough. Government are beginning to demand support for additional documents, services and processes to gain even more control and visibility of all business transactions.

Compliance may have to be considered not just at an eInvoicing but a complete supply chain level. The move to compliant eInvoicing – especially cross-border – should best be seen in the context of improving all your supply chain processes. Large multi-national companies require global service providers – such as OpenText™ – that can combine the experience of eInvoicing compliance in all your markets with services encompassing e-procurement, accounts payable and enterprise application integration into a single solution.

Learn about how OpenText eInvoicing solutions can benefit your business.

OpenText

June 22, 2018

Integration, Supply Chain, Technologies

B2B Integration, Business Network, e-Invoicing
The Identity of Things is here

The massive growth of Internet of Things (IoT) devices is placing significantly increased focus on identity management. Forrester has suggested that, by 2022, there may be up to 100 times more IoT devices in the world than there are cellphones and laptops. In this hyper-connected digital world, security and accessibility are must-have foundations. Welcome to the Identity of Things (IDoT). Let’s look at why you’re going to need an identity platform for your IoT initiatives.

Although slightly less optimistic, figures from Statistica estimate that there will be over 50 billion IoT connected devices globally by 2020 ꟷ almost double the number in 2017. Gartner predicts that, by 2020, there will be 215 trillion stable IoT connections, and 63 million new ones every second.

The use of IoT brings real benefits to the organization implementing the devices and those benefits grow the more devices they connect. In their IoT Barometer 2017/8, Vodafone found that 26% of companies with under 100 connected devices reported significant business benefits from their IoT investments, which rose to 67% for organizations with over 50,000 devices.

In addition, Vodafone found that organizations were integrating IoT closely with their business processes ꟷ 44% of respondents said IoT projects were part of wider initiatives and a further 46% reported integrating IoT with core systems such as ERP and CRM. As IoT becomes increasingly mission-critical in a wide range of industry sectors, organizations need a fresh approach to managing identity that moves beyond simple users to encompass all network entities ꟷ applications, systems, devices and things.

ABI Research predicts that identity and management of IoT will be worth $21.5 billion by 2022, noting: ‘We are entering a transformational period where device IDs, system IDs, and user IDs are forced to merge under the hyper-connected IoT paradigms, effectively altering the way IDoT will be perceived from now on.’

Just reading that sentence gives a great idea of the complexity that every organization faces as they increase their adoption of IoT technologies. However, the challenge starts with the devices themselves.

IoT devices – The weakest link?

No one really needs reminding of the security implications of IoT ꟷ the damage a hacker could achieve through compromising a connected vehicle, a copier connected to your corporate network, or even a medical device. The stakes are high and protecting an always-on IoT network is essential. Forrester points out that few IoT devices have been designed with security as a paramount consideration. This leaves enterprises vulnerable because IoT devices can act as a “back door” to the corporate network. Hackers can use the IoT devices connected to the network to launch major attacks on your company’s infrastructure and other resources. Once a vulnerability is identified and exploited, the cybercriminal can create botnets of IoT devices for the likes of denial of service attacks.

Lack of sophisticated operating systems

Forrester suggests that, in the drive to bring new devices to market, manufacturers can overlook the security of the operating system or firmware on the device. Devices like sensors are, by their nature, designed to conduct a specific discrete task so the operating system is going to be as limited as possible, lacking in security features especially authentication and cryptography. This appears an attractive opportunity for hackers.

No input mechanisms for complex passwords

Forrester also notes that almost any IoT device doesn’t have a keyword to enter a password. Let’s face it. If you have thousands or hundreds of thousands of IoT devices spread throughout your organization, you don’t want this to be your form of authentication. You need authentication at scale. This is a challenge as implementing multi-factor authentication ꟷ which you would employ for users ꟷ is going to be extremely difficult and probably inappropriate for IoT devices.

Traditional identity and access management (IAM) technologies and methods are ill-suited to this IoT environment. You require a new approach to delivering the secure and robust technologies needed to provision, authenticate, authorize and audit the identities of IoT devices. More importantly, you need to be able to manage the relationships between those devices, your systems, your applications and your people.

IDoT – A new identity management paradigm?

As early as 2014, Ant Allen, Vice President of identity and access management at Gartner was warning that traditional IAM systems would be unable to cope with the proliferation of connected IoT devices because traditional authentication was based solely around the user and their access to applications and data. What Allen pointed out as an issue four years ago has become a pressing concern today.

A new generation of identity management solutions such as OpenText™ Core Secure Access have developed to address the security and identitiy management needs of the new IoT-driven world. They address five key areas:

The relationships of networked entities

IAM can no longer be seen in the context of granting users ꟷ whether internal or external ꟷ access to network resources. You have to address all the devices, systems, people and things on the network. Each has multiple, multi-faceted relationships with other network entities. A device will communicate with other devices, end-users and applications. Every second there is a huge number of different relationships that have to be controlled, managed and secured.

Multi-tiered authentication

While most companies are moving beyond single sign-on and implementing multi-factor authentication for their people, this is unlikely to be the answer for device authentication. Many experts suggest that using the public key infrastructure (PKI) information already in the device will be the answer. Either way, you need to support multi-tiered authentication to govern relationships where different entities require different authentication methods.

Identities and contextual access management

We all know that security isn’t just about giving the right levels of access. It is equally about when and why access is granted. Just as you will want to grant temporary or real-time access for users, you will need to set limits for IoT devices and determine what is appropriate. This type of large-scale provisioning must be able to understand the context of what the device does and why it is making the request ꟷ and, even better, if the system can make a prediction or suggest an action based on a determination of a normal state of behavior (related: this is one direction we’re headed and I’ll save the topic of identities and artificial intelligence (AI) for a future blog). This is an enormous undertaking and IT teams will struggle to reach this point on their own, even with the right talent, funds and resources, and other vendors are still focused on identity management for new employee onboarding and not identities for IoT initiatives.

Effective provisioning and de-provisioning

I doubt there’s a security manager today that has not had sleepless nights worrying about orphan accounts. How do we ensure that all access rights are removed when someone leaves the company? This issue is multiplied horribly when dealing with a massive amount of disparate devices ꟷ potentially spread globally. You not only need to provision new identities to devices quickly ꟷ with the correct access rights, which means you also need a rules engine ꟷ but you also need to be able to de-provision them just as effectively.

Delivering excellent user experience

IAM in the IoT world is much more multi-layered and complex than previously, but the compromise between accessibility and usability is less acceptable now than ever. People are used to the Internet as a medium that brings almost instant results, and they expect that whether online shopping or working on the corporate network. Today, how you deliver excellent user experience must be at the heart of your IDoT system.

The need for an IoT identity platform

I want to take a step back for a second and look at Forrester’s definition of IAM for IoT: ‘A collection of existing and emerging technologies that allow manufacturers, operators and end users to manage the identity life cycle, governance and authentication of IoT devices’.

This definition describes the current state of the market where there are many specialist solutions that address specific elements of the IDoT challenge. While excellent, these solutions require organizations to create an ecosystem of complementary solutions to address their overall requirements. It adds cost, complexity and management overhead to security and access in the IoT world.

This situation, according to Philip Windley of Brigham University, won’t continue. He says, “All the communities that already identify IoT components will become more aware of each other and begin to collaborate on broader IoT identity platform standards.” In fact, this type of comprehensive, enterprise IoT identity platform already exists today.

With the new generation of identity management platforms for IoT ꟷ such as OpenText Core Secure Access ꟷ you already have the capabilities to effectively manage all the entities on your IoT network and the relationships between them.

If you’d like to know more about how an identity management platform can accelerate IoT success, this is a key topic at Enterprise World next month in Toronto. For a personalized and private meeting, please contact us through the website or email me directly.

OpenText

June 21, 2018

Line of Business, News & Events, OpenText World, Supply Chain, Technologies

Big Data, Business Network, Digital Supply Chain, digital transformation, IAM, IAM security, identity and access management, Information Management, internet of things, IoT
Query Tuning with Vertica: Dos and Don’ts
Query tuning in Vertica is not an exact science. Recommendations differ based on your database. This document assumes that all nodes in the cluster are UP, your Vertica configuration is ok, and that v*perf tools have been executed.

The following diagram shows the query flow in Vertica:

Vertica Optimizer

Queries can be executed in many ways. The Vertica optimizer quickly finds the best way to execute a query. Vertica uses a cost-based optimizer. The cost model represents the costs as a function of the amount of data flowing through the plan. Each query plan alternative is associated with a cost that estimates the amount of resources Vertica needs to execute the query, including CPU, disk, memory and network. The query optimizer selects the plan with the lower costs, which is also usually the one that is faster.

The query optimizer relies on statistics and heuristics to determine the execution plan costs, including the following:
- Number of rows in the table
- Cardinality of each column
- Min/max values of each column
- Values distribution histogram for each column
- Column footprint
- The access path with the fewest expected I/O operations and lowest CPU, memory, and network usage
- Join types based on different projection choice
- Join order
- Predicates selectivity
- Data redistribution algorithms across nodes in the cluster
Do: Check the SQL

Start by looking at the SQL itself. Try to reduce complications as much as possible. For example, this SQL with 5 nested function calls and two string concatenations can be reduced:
to_char(YEAR_ISO(period_key)) ||'-W’|| lpad(to_char(WEEK_ISO(period_key)),2,'0')
Replace it with 1 function call and no string concatenations:
to_char(period_key, 'IYYY-"W"IW')
Avoid passing UDx arguments. Instead, use parameters. Keep in mind that inequality predicates and OR operators are slow.

Do: EXPLAIN Your Query

The EXPLAIN plan describes how the optimizer would like to execute a query, before the query is actually executed. You should check the following:
- GLOBAL RESEGMENTATION
- BROADCAST
- JOIN ORDER
- JOIN TYPE
- GBY TYPE
- COSTs and ROWs
- Projections being used
- Columns being materialized
Do: Update Statistics

You should update your statistics:
- After a consistent table load or update
- After a table is altered
- When a projection is refreshed
You can also run ANALYZE_STATISTICS immediately before running your benchmark.

Do: Run Your Query Using vsql

Use vsql to run your perf test:
$ vsql -AXtnqi -f query.sql -o /dev/null
Axtnqi means:
- Use unaligned output mode (A)
- Do not run commands in the vsql initialization file (X)
- Disable printing column names (tuples only) (t)
- Disable command line editing (n)
- Work quietly (q)
- Print timing information (i)
Do: Check QUERY_EVENTS

QUERY_EVENTS contains very useful information generated during either the OPTIMIZATION or EXECUTION of event categories.

Do: DDLs and Projections

DDLs and projection definitions are some of the most important optimization techniques. DDLs are used to profile your data and ensure it uses the right data types. Consider replacing fat joining or grouping columns with slick integers. Also consider flattening tables to avoid or reduce joins. Take advantage of LAPs when possible.

Avoid creating too many projections, because loads will be slower. Use the SEGMENTED BY clause to avoid resegmentation with either joins or GROUP BY. Each node should be able to group or join its own data without looking into other nodes.

Use ORDER BY to influence the GROUP BY and join type:
- Joins: projections are sorted on the joining column(s). You get a MERGE JOIN rather than a HASH JOIN. MERGE joins never spill to disk.
- GROUP BY: if grouping columns are a subset of the ones in the SORT BY clause, you get a PIPELINED GROUP BY rather than a HASH GROUP BY. Pipelined GROUP BYs never spill to disk.
Do: Profile Your Query

The query profile provides very detailed information about each single operator used during the execution. Data profiling is available in the V_MONITOR.EXECUTION_ENGINE_PROFILE if you explicitly profiled the query. Even a simple query can easily produce thousands of EXECUTION_ENGINE_PROFILEs.

The EXECUTION_ENGINE_PROFILE contains the following information:

Node name

User information

Session, transaction, and statement IDs

Plan information

Operator name

Counter name

Counter value

Counters change from one operator to another.

Do: Update System Config (If needed)

You might want to change some system parameters to improve performance. Do this with caution.

Don’t: Underestimate Data Extraction

If your query returns a large result set, moving data to the client can take a lot of time. Redirecting client output to /dev/null still implies moving data to the client. Consider instead storing the result set in a LOCAL TEMPORARY TABLE.

Useful Queries

The following query checks the data distribution for a given table. This is often useful to look into a plan when no statistics are available:
select projection_name, node_name, sum(row_count) as row_count, sum(used_bytes) as used_bytes, sum(wos_row_count) as wos_row_count, sum(wos_used_bytes) as wos_used_bytes, sum(ros_row_count) as ros_row_count, sum(ros_used_bytes) as ros_used_bytes, sum(ros_count) as ros_count from projection_storage where anchor_table_schema = :schema and anchor_table_name = :table

group by 1, 2
order by 1, 2;

The following query shows the non-default configuration parameters:
SELECT parameter_name, current_value, default_value, description FROM v_monitor.configuration_parameters WHERE current_value <> default_value ORDER BY parameter_name;

The following query checks encoding and compression for a given table:
SELECT cs.projection_name, cs.column_name, sum(cs.row_count) as row_count, sum(cs.used_bytes) as used_bytes, max(pc.encoding_type) as encoding_type, max(cs.encodings) as encodings, max(cs.compressions) as compressions FROM column_storage cs inner join projection_columns pc on cs.column_id = pc.column_id WHERE anchor_table_schema = :schema and anchor_table_name = :table GROUP BY 1, 2 ORDER BY 1, 2;

The following will retrieve the EXPLAIN PLAN for a given query:
SELECT path_line FROM v_internal.dc_explain_plans WHERE transaction_id=:trxid and statement_id=:stmtid ORDER BY path_id, path_line_index;

The following shows the resource acquisition for a given query:
SELECT a.node_name, a.queue_entry_timestamp, a.acquisition_timestamp, ( a.acquisition_timestamp - a.queue_entry_timestamp ) AS queue_wait_time, a.pool_name, a.memory_inuse_kb as mem_kb, (b.reserved_extra_memory_b/1000)::integer as emem_kb, (a.memory_inuse_kb-b.reserved_extra_memory_b/1000)::integer AS rmem_kb, a.open_file_handle_count as fhc, a.thread_count as threads FROM v_monitor.resource_acquisitions a inner join query_profiles b on a.transaction_id = b.transaction_id WHERE a.transaction_id=:trxid and a.statement_id=:stmtid ORDER BY 1, 2;

The following gives query events for a given query:
SELECT event_timestamp, node_name, event_category, event_type, event_description, operator_name, path_id, event_details, suggested_action FROM v_monitor.query_events WHERE transaction_id=:trxid and statement_id=:stmtid ORDER BY 1;

The following query shows transaction locks:
SELECT node_name,(time - start_time) as lock_wait, object_name, scope, result,description FROM v_internal.dc_lock_attempts WHERE transaction_id = :trxid ;

The following query shows threads by profile operator:
SELECT node_name, path_id, operator_name, activity_id::varchar || ',' || baseplan_id::varchar || ',' || localplan_id::varchar as abl_id, count(distinct(operator_id)) as '#Threads' FROM v_monitor.execution_engine_profiles WHERE transaction_id=:trxid and statement_id=:stmtid GROUP BY 1,2,3,4 ORDER BY 1,2,3,4;

The following query shows how you can retrieve the query execution report:
SELECT node_name , operator_name, path_id, round(sum(case counter_name when 'execution time (us)' then counter_value else null end)/1000,3.0) as exec_time_ms, sum(case counter_name when 'estimated rows produced' then counter_value else null end ) as est_rows, sum ( case counter_name when 'rows processed' then counter_value else null end ) as proc_rows, sum ( case counter_name when 'rows produced' then counter_value else null end ) as prod_rows, sum ( case counter_name when 'rle rows produced' then counter_value else null end ) as rle_pr_rows, sum ( case counter_name when 'consumer stall (us)' then counter_value else null end ) as cstall_us, sum ( case counter_name when 'producer stall (us)' then counter_value else null end ) as pstall_us, round(sum(case counter_name when 'memory reserved (bytes)' then counter_value else null end)/1000000,1.0) as mem_res_mb, round(sum(case counter_name when 'memory allocated (bytes)' then counter_value else null end )/1000000,1.0) as mem_all_mb FROM v_monitor.execution_engine_profiles WHERE transaction_id = :trxid and statement_id = :stmtid and counter_value/1000000 > 0 GROUP BY 1, 2, 3 ORDER BY case when sum(case counter_name when 'execution time (us)' then counter_value else null end) is null then 1 else 0 end asc , 5 desc ;

Troubleshoot using the following checklist if your database performance is slow.

Check if any of the following problems exist:

Step	Task	Results
1	Is the query performance slow?	If the query performance is slow, review the Query Performance checklist. If the query performance is not slow, go to Step 2.
2	Is the entire database slow?	If the whole database is slow, go to Step 3. If the whole database is not slow, your checklist is complete.
3	Check if all the nodes are UP. `=> SELECT node_name, node_address, node_state FROM nodes WHERE node_state != 'UP';`	If there is any node DOWN, Investigate why the node is down review the Node Down checklist. Restart the node. `$ admintools –t restart_nodes –d <database> -s <nodes_address>` If nodes restarted and the performance improved, your checklist is complete. If the node restarted and performance is still slow, go to Step 4. If the node did not restart, review the Node Down checklist.
4	Check if there are too many delete vectors. `=> SELECT count(*) FROM delete vectors;`	If there are more than 1000 delete vectors, review the Manage Delete Vectors checklist. If there are not too many delete vectors, go to Step 5.
5	Check if epochs are advancing. `=> SELECT current_epoch, ahm_epoch, last_good_epoch, designed_Fault_tolerance, current_fault_tolerance FROM system ;`	If epochs are not advancing, review the AHM not Advancing checklist. If epochs are advancing, go to Step 6.
6	Check if one node is slower than the others. Run a select statement for each node in the cluster and identify the there is a slower node. $ `grep -P "^v_" /opt/vertica/config/admintools.conf\|awk '{print $3}'\| awk -F, '{print $1}'`; do echo ----- $host -----; date ; vsql -h $host -c ";select /+kV/ 1 ;";date ; done	If one node is slower than the others, Investigate host performance issue. Restart the Vertica process on that node. Start: `$ admintools –t restart_node –d <database> -s` Stop: `$ admintools –t stop_node –s <node_ip/Host slow>` If all the nodes have similar performance, go to Step 7.
7	Check if the workload is balanced across all the nodes. `=> SELECT node_name,count(*) FROM dc_requests_issued WHERE time > sysdate() -1 group by 1 ORDER BY 1;`	If one node has a heavier workload, distribute the workload to all the nodes. Review the documentation on Connection Load balancing. If the workload is balanced, go to Step 8.
8	Check if there are resource rejections. `=> SELECT * FROM resource_rejections ORDER BY last_rejected_timestamp;`	If there are resource rejections, review the Query Performance checklist. If there are no significant resource rejections that justify slowness go to Step 9.
9	Check if there are sessions in queue. `=> SELECT * FROM resource_queues;`	If there are queries waiting for resources, go to Step 10.
10	Check if there are long-running sessions that are using too many resources. => SELECT r.pool_name, s.node_name AS initiator_node, s.session_id, r.transaction_id, r.statement_id, max(s.user_name) AS user_name, max(substr(s.current_statement, 1, 100)) AS statement_running, max(r.thread_count) AS threads, max(r.open_file_handle_count) AS fhandlers, max(r.memory_inuse_kb) AS max_mem, count(DISTINCT r.node_name) AS nodes_count, min(r.queue_entry_timestamp) AS entry_time, max(((r.acquisition_timestamp - r.queue_entry_timestamp))) AS waiting_queue, max(((clock_timestamp() - r.queue_entry_timestamp))) AS running_time FROM (v_internal.vs_resource_acquisitions r JOIN v_monitor.sessions s ON (((r.transaction_id = s.transaction_id) AND (r.statement_id = r.statement_id)))) WHERE (length(s.current_statement) > 0) GROUP BY r.pool_name, s.node_name, s.session_id, r.transaction_id, r.statement_id ORDER BY r.pool_name;	If there is a statement running for too long and using a high proportion of the box resources, consider stopping the statement, `=> SELECT interrupt_statement(‘session_id’,’statement_id’);` Upon statement cancellation, resources should be freed and performance should improve. If it does not improve, go to Step 11.. If the statement does not terminate properly, contact Vertica Technical Support.
11	Check if any transactions are waiting for locks. `=> SELECT * FROM locks where grant_timestamp is null;`	If transactions are waiting for locks, identify lock-holding sessions and consider wait for transaction to complete or cancel the session to free locks. `=> SELECT interrupt_stament(‘session_id’,’statement_id’);` Upon statement completion or cancellation and lock release, performance should improve. If it does not improve, go to Step 12. If the session does not terminate properly, contact Vertica Technical Support reporting a hang session.
12	Check the catalog size in memory. `=> SELECT node_name,max(ts) as ts, max(catalog_size_in_MB) as catlog_size_in_MB FROM ( SELECT node_name,trunc((dc_allocation_pool_statistics_by_second."time")::TIMESTAMP, 'SS'::VARCHAR(2)) AS ts, sum((dc_allocation_pool_statistics_by_second.total_memory_max_value - dc_allocation_pool_statistics_by_second.free_memory_min_value))/(1024*1024) AS catalog_size_in_MB from dc_allocation_pool_statistics_by_second group by 1,2) foo group by 1 ORDER BY 1 limit 50;`	If catalog is larger than the 5% of memory in the host, resource pools should be adjust to free memory needed by the catalog as Vertica process is a risk of being terminated by the kernel with OOM. Contact Vertica Technical Support to debug catalog size growth and discuss alternatives to free memory to be used to allocated catalog. Alternatives are: Adjust general pool to use less than 95%. Create an additional Pool of size of the difference needed to accommodate catalog. Adjust the METADATA resource pool to free memory for the catalog In Many cases restarting the node could free memory used by catalog, debugging with support will help to determine the best course of action.
13	Check usage of resident and virtual memory and maps of memory created. `=> SELECT * FROM ( SELECT time, node_name, files_open, other_open,sockets_open,virtual_size,resident_size,thread_count,map_count, row_number() over (partition by node_name ORDER BY time::timestamp desc) as row FROM dc_process_info ) a where row <=3 ;`	If Virtual Memory or resident memory is high, monitor to see if the numbers lower. If the numbers do not lower, contact Vertica Technical support to debug the issue. Restarting the nodes should resolve the issue but a proper debugging should be done, follow Catalog Size Debugging checklist.

Learn More

Learn more about Connection Load Balancing in the Vertica Documentation.

Step	Task	Results
1	Check whether your database is UP. `$ admintools -t db_status -s UP`	If the database is UP, go to Step 2. If the database if not UP, restart your database. `$ admintools -t start_db -d <Database_name> -p <Database_password>` If the database starts, the checklist is complete. If the database does not start, see the Database Process Not Starting checklist.
2	Identify all the DOWN nodes. `=> SELECT node_name, node_address, node_state FROM nodes WHERE node_state = 'DOWN';`	Upon identification of all the DOWN node/s, proceed to Step 3.
3	Check whether you can establish a connection with the DOWN nodes using SSH. `$ ssh dbadmin @<nodedown_ip>`	If you can SSH into the node, restart Vertica process on the DOWN node. `$ admintools -t restart_node -d <database_name> -s <node_host_name or IP>` If the restart was successful, the checklist is complete. If the restart failed, go to Step 4. If you cannot SSH into the node, contact your system administrator to find if it is a port issue or a network issue.
4	To find reasons for restart failure, on the DOWN node tail startup.log. A snippet of the tail startup.log is as follows: `$ tail -f catalog-path/database-name/v_database-name_node_catalog/startup.log { "node" : "v_cdmt0_node0001", "stage" : "Database Halted", "text" : Data consistency problems found; Check that all file systems are properly mounted. Also, the --force option can be used to delete corrupted data. "timestamp" : "2016-07-31 18:17:04.122" }`	The log results shows the latest state of the DOWN node. Proceed to Step 5 to see these stages.
5	If the startup.log… a. Remains in the Waiting for cluster invitestage.	See the Spread Debugging Checklist.
	b. Remains in the Recovery stage.	See the Node Recovery Checklist.
	c. Shows an error message, Data inconsistency problems found, restart the node with the force option. `$ admintools -t restart_node -d Database_name -s node_name --force`	Upon restart the checklist is complete.
	d. Shows no new data in the startup.log after last restart, check dbLog file for errors.	If you cannot resolve errors, contact Vertica Support.
	e. Shows Shutdown Complete but the node is still DOWN. tail vertica.log and look for <ERROR> and <PANIC>.	Contact Vertica Support with PANIC report, ErrorReport.txt, and scrutinize.

Customizing financial services for a mass audience

AI: When the machine learns from its mistakes

AI-guided recommendations = ROI

The benefits of eInvoicing

Lessons from the EU

Lessons from Latin America

From B2G to B2B

The eInvoicing challenge for global enterprises

Closing the VAT gap

Mexico: The exemplar of the clearance model

The rise and rise of the clearance model

The pivotal role of the eInvoicing service provider

IoT devices – The weakest link?

Lack of sophisticated operating systems

No input mechanisms for complex passwords

IDoT – A new identity management paradigm?

The relationships of networked entities

Multi-tiered authentication

Identities and contextual access management

Effective provisioning and de-provisioning

Delivering excellent user experience

The need for an IoT identity platform

Vertica Optimizer

Do: Check the SQL

Do: EXPLAIN Your Query

Do: Update Statistics

Do: Run Your Query Using vsql

Do: Check QUERY_EVENTS

Do: DDLs and Projections

Do: Profile Your Query

Do: Update System Config (If needed)

Don’t: Underestimate Data Extraction

Useful Queries

The identity challenge

Identity management: Responding to the challenge

5 key capabilities of an identity management platform

1. Identity provisioning

2. Authentication management

3. Identity federation

4. Identity governance

5. IDaaS deployment

Learn More