When I was a litigation paralegal, it was always a hassle trying to search records within an eDiscovery database for terms that included numbers and symbols – an issue that came up quite frequently. Searching for currency values, product numbers, serial numbers, patent numbers, and so on, are common. How does your eDiscovery software handle searching for numbers and special characters?
I’ve seen a variety of approaches with different software applications. Some ignore special characters; some treat them as a space that must be accounted for in your search; and some allow you to index special characters on a case-by-case basis. This makes searches for numbers that include characters like commas, semicolons, or dollar symbols very difficult.
More sophisticated software applications like OpenTextTM Axcelerate Review & AnalysisTM (“Axcelerate”) automatically include the ability to search for many special characters. Axcelerate even adds a bit of analytics to searches for characters that are associated with numeric values. More on that below.
Punctuation and Other Special Characters
Within Axcelerate, there are many characters that are always fully searchable. These characters are indexed separately and together with their closest adjacent terms. This allows for a fully functional search experience that includes:
- Searching currency values ($, ¥, etc.)
- Numbers associated with special characters (e.g., =, +, -, %, etc.)
- eMail addresses
- Folder paths
- Trademark, copyright and other common registered symbols
- Company names that include special characters (e.g., Edward & Jones)
But you can still search for these same terms without the symbols and the results will include both variations. For example, searching for “Edward & Jones” will return only results that include the ampersand symbol, but searching for “Edward Jones” will locate both “Edward Jones” and “Edward & Jones” within its results.
Punctuation Within the Body of Text
In addition, there are some punctuations that can be searched when found within the body of a term without spaces (i.e., not found at the beginning or ending of a word). This includes the following characters:
Punctuation Found Within Body of Term
Special characters that also serve as Axcelerate functions will need to be entered with quotes around them to differentiate them from their functional purpose (e.g., # symbol that is used to indicate a stemmed search when placed at the beginning of a word can also be searched as a character if contained in quotes, such as “#”).
Currency Characters and Symmetric Matching
Symmetric matching uses analytics to locate adjacent currency characters (e.g., $, ¥, etc.) or currency ISO codes (e.g., USD, CNY, etc.), no matter what side the character/code appears next to the number (i.e., before the number or after). Below are some example queries and expected return results:
Other numeric-related symbols searched with symmetrical matching include:
Numeric Separator Characters
Depending on the originating country, numbers are often displayed with characters to represent the thousandth separators. The most common characters are commas, periods and apostrophes. It is also common to display decimal separators with periods or commas. In Axcelerate, the user can create a stemmed search (using either Stem mode or by inserting # before the search term) to search for numeric values regardless of separators used. For example, below are expected return results for searching #12345675:
Stemmed Search of Numeric Separators
More expansive search strategy
Using numeric and character search terms tactically with the above functionality provides us eDiscovery users the flexibility to be more surgical or more expansive in our search strategies. I add this to my list of features I wish I had when I was a paralegal, knowing how much time it would have saved looking for that one patent application or a series of product model numbers.