Identifying Payment Cards at Rest – Going Beyond the Key Word Search

In this blog, I’ll be discussing an approach for locating payment card numbers stored in MSSQL databases without relying on key words for data discovery.

To overcome the impracticality of pulling an entire database over the wire for advanced analysis, we’ll focus on using MSSQL’s native capability to filter out items that can’t contain cardholder data. This will greatly reduce the need for local checks.

The Pattern in the Number

Before we can begin, we need to understand what we’re looking for.  For this exercise the focus will be on four card types; Visa, MasterCard, American Express, and Discover. These cards all have known lengths and starting sequences.

Card TypeLength (digits)Starting Sequence
MasterCard1650-55, 222100-272099
American Express1534 or 37
Discover166011 or 65

*Visa issued 13 digit cards in the past, but those cards are no longer valid.

*MasterCard started issuing 2-series BINs (222100-272099) January 2017.  The code examples below have not been updated to support these numbers.

The first 6 digits of each of these cards make up the IIN (Issuer Identification Number) also known as the BIN (Bank Identification Number). Card issuers don’t generally provide official lists of IINs, but several community driven efforts to catalog this information exist. A good example can be found here:

The next 1-5 digits are known as the account range. The account range is followed by the customer identification number (CIN) and the check digit. Although the account range and CIN are going to be unknowns, the check digit is generated using a mathematical formula, and thus can be validated.

IINAccount Range & Customer Identification NumberCheck Digit
4111 1111 1111 1111

For more ideas about how to leverage the check digit, I recommend reading this post made by Karl Fosaaen.

MSSQL Filtering

We can’t positively identify payment cards with MSSQL’s native pattern matching capability, but we can prove several negatives that will allow us to eliminate tables, columns, and even individual cells that don’t contain payment cards.

The first thing we need to do is query the MSSQL server for a list of available databases. Filtering out default system databases is the first step in cutting down the amount of content we’ll look at locally.

SQL – List available databases

USE master;
(NAME NOT LIKE 'distribution') AND (NAME NOT LIKE 'master') AND 
(NAME NOT LIKE 'publication') AND (NAME NOT LIKE 'reportserver') AND 
(NAME NOT LIKE 'reportservertempdb') AND (NAME NOT LIKE 'resource') AND 
(NAME NOT LIKE 'tempdb') 

Excluded databases

(distribution, master, model, msdb, publication, reportserver, reportservertempdb, resource, tempdb)

The next step is to list the tables that may contain payment card data for the remaining databases; again, eliminating system defaults.

The SQL examples provided were created for the AdventureWorks2014 database, made freely available by Microsoft.

SQL – List tables

USE AdventureWorks2014;
SELECT '[' + SCHEMA_NAME(t.schema_id) + '].[' + + ']' AS fulltable_name, 
SCHEMA_NAME(t.schema_id) AS schema_name, AS table_name, 
i.rows FROM sys.tables AS t INNER JOIN sys.sysindexes AS i ON t.object_id = AND 
i.indid < 2 WHERE (ROWS > 0) AND ( NOT LIKE 'syscolumns') AND 
( NOT LIKE 'syscomments') AND ( NOT LIKE 'sysconstraints') AND 
( NOT LIKE 'sysdepends') AND ( NOT LIKE 'sysfilegroups') AND 
( NOT LIKE 'sysfiles') AND ( NOT LIKE 'sysforeignkeys') AND 
( NOT LIKE 'sysfulltextcatalogs') AND ( NOT LIKE 'sysindexes') AND 
( NOT LIKE 'sysindexkeys') AND ( NOT LIKE 'sysmembers') AND 
( NOT LIKE 'sysobjects') AND ( NOT LIKE 'syspermissions') AND 
( NOT LIKE 'sysprotects') AND ( NOT LIKE 'sysreferences') AND 
( NOT LIKE 'systypes') AND ( NOT LIKE 'sysusers') 

Excluded Tables:

(syscolumns, syscomments, sysconstraints, sysdepends, sysfilegroups, sysfiles, sysforeignkeys, sysfulltextcatalogs, sysindexes, sysindexkeys, sysmembers, sysobjects, syspermissions, sysprotects, sysreferences, systypes, sysusers)

Now we’ll list columns for each table, this time filtering on column length and data type. For this example, we’ll focus on the “CreditCard” table.

SQL – List columns

USE AdventureWorks2014;
DATA_TYPE NOT IN ('bigint','binary','bit','cursor','date','datetime','datetime2',
'tinyint','uniqueidentifier','varbinary','xml') AND 
TABLE_NAME='CreditCard' OR 
DATA_TYPE NOT IN ('bigint','binary','bit','cursor','date','datetime','datetime2',
'tinyint','uniqueidentifier','varbinary','xml') AND 

Excluded Data Types:

(bigint, binary, bit, cursor, date, datetime, datetime2, datetimeoffset, float, geography, hierarchyid, image, int, money, real, smalldatetime, smallint, smallmoney, sql_variant, table, time, timestamp, tinyint, uniqueidentifier, varbinary, xml)

The last set of server side filters we’ll apply take advantage of the weak pattern matching available in MSSQL to eliminate cells that don’t match know card formats.

SQL- Apply MSSQL pattern matching

/* create temp table with appropriate columns and data types */

CREATE TABLE #dataloc (RowNumber INT IDENTITY(1,1), "CardNumber" nvarchar(25));

/* populate temp table with data that matches payment card formats */

INSERT INTO #dataloc 
Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE 
UNION Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE 
UNION Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE 
UNION Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE 

This SQL copies matching rows to a temp table and adds row numbers for later use.

Local Validation Testing

We now have a temp table filled exclusively with rows of data containing 15-16 digits that loosely match known payment card patterns.

It’s time to start pulling content over the wire for local processing. The row numbers assigned earlier will be used to break the dataset into chunks.

SQL – Querying potential card numbers

SELECT * FROM #dataloc WHERE (RowNumber >=1 AND RowNumber <=4000) ORDER BY RowNumber;

Now that we have full cell data, it’s time to use regex to extract all potential payment card numbers. It’s entirely possible that some cells will contain multiple payment cards.

American Express(?<![0-9])3\D{0,4}(4|7)(\D{0,4}\d){13}[^0-9]

If you’re familiar with regex, you may have noticed that these patterns exclude matches that are immediately prepended or followed with a digit. Although this will ultimately result in some false negatives, extensive real world testing has shown that these matches are almost always false positives.

The matches extracted using the above regex are then subjected to Luhn validation checks. Any regex match that fails the Luhn check is immediately thrown out.

Luhn Validation, invented in 1954 by Hans Peter Luhn –


Testing has shown that this approach has the potential to provide higher accuracy when compared to a keyword search. It also enables us to locate payment card numbers in unexpected places such as free form notes fields, or repurposed legacy columns, but this accuracy comes with a price. Database performance loads and scan times are significantly greater than those generated by the keyword search offered by PowerUpSQL making the best approach dependent on your specific use case.

In the next few weeks I’ll be releasing a tool that serves as a proof of concept for the search method discussed in this post. In the meantime, if you’d like to read more about locating sensitive data with Scott Sutherland’s PowerUpSQL, you can do that here:

Discover why security operations teams choose NetSPI.