In this blog, I’ll be discussing an approach for locating payment card numbers stored in MSSQL databases without relying on key words for data discovery.
To overcome the impracticality of pulling an entire database over the wire for advanced analysis, we’ll focus on using MSSQL’s native capability to filter out items that can’t contain cardholder data. This will greatly reduce the need for local checks.
The Pattern in the Number
Before we can begin, we need to understand what we’re looking for. For this exercise the focus will be on four card types; Visa, MasterCard, American Express, and Discover. These cards all have known lengths and starting sequences.
Card Type
Length (digits)
Starting Sequence
Visa
16
4
MasterCard
16
50-55, 222100-272099
American Express
15
34 or 37
Discover
16
6011 or 65
*Visa issued 13 digit cards in the past, but those cards are no longer valid.
*MasterCard started issuing 2-series BINs (222100-272099) January 2017. The code examples below have not been updated to support these numbers.
The first 6 digits of each of these cards make up the IIN (Issuer Identification Number) also known as the BIN (Bank Identification Number). Card issuers don’t generally provide official lists of IINs, but several community driven efforts to catalog this information exist. A good example can be found here: https://www.stevemorse.org/ssn/List_of_Bank_Identification_Numbers.html
The next 1-5 digits are known as the account range. The account range is followed by the customer identification number (CIN) and the check digit. Although the account range and CIN are going to be unknowns, the check digit is generated using a mathematical formula, and thus can be validated.
We can’t positively identify payment cards with MSSQL’s native pattern matching capability, but we can prove several negatives that will allow us to eliminate tables, columns, and even individual cells that don’t contain payment cards.
The first thing we need to do is query the MSSQL server for a list of available databases. Filtering out default system databases is the first step in cutting down the amount of content we’ll look at locally.
SQL – List available databases
USE master;
SELECT NAME FROM sysdatabases WHERE
(NAME NOT LIKE 'distribution') AND (NAME NOT LIKE 'master') AND
(NAME NOT LIKE 'model') AND (NAME NOT LIKE 'msdb') AND
(NAME NOT LIKE 'publication') AND (NAME NOT LIKE 'reportserver') AND
(NAME NOT LIKE 'reportservertempdb') AND (NAME NOT LIKE 'resource') AND
(NAME NOT LIKE 'tempdb')
ORDER BY NAME;
The next step is to list the tables that may contain payment card data for the remaining databases; again, eliminating system defaults.
The SQL examples provided were created for the AdventureWorks2014 database, made freely available by Microsoft. https://msftdbprodsamples.codeplex.com/releases/view/125550
SQL – List tables
USE AdventureWorks2014;
SELECT '[' + SCHEMA_NAME(t.schema_id) + '].[' + t.name + ']' AS fulltable_name,
SCHEMA_NAME(t.schema_id) AS schema_name, t.name AS table_name,
i.rows FROM sys.tables AS t INNER JOIN sys.sysindexes AS i ON t.object_id = i.id AND
i.indid < 2 WHERE (ROWS > 0) AND (t.name NOT LIKE 'syscolumns') AND
(t.name NOT LIKE 'syscomments') AND (t.name NOT LIKE 'sysconstraints') AND
(t.name NOT LIKE 'sysdepends') AND (t.name NOT LIKE 'sysfilegroups') AND
(t.name NOT LIKE 'sysfiles') AND (t.name NOT LIKE 'sysforeignkeys') AND
(t.name NOT LIKE 'sysfulltextcatalogs') AND (t.name NOT LIKE 'sysindexes') AND
(t.name NOT LIKE 'sysindexkeys') AND (t.name NOT LIKE 'sysmembers') AND
(t.name NOT LIKE 'sysobjects') AND (t.name NOT LIKE 'syspermissions') AND
(t.name NOT LIKE 'sysprotects') AND (t.name NOT LIKE 'sysreferences') AND
(t.name NOT LIKE 'systypes') AND (t.name NOT LIKE 'sysusers')
ORDER BY TABLE_NAME;
Now we’ll list columns for each table, this time filtering on column length and data type. For this example, we’ll focus on the “CreditCard” table.
SQL – List columns
USE AdventureWorks2014;
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE
CHARACTER_MAXIMUM_LENGTH > 14 AND
DATA_TYPE NOT IN ('bigint','binary','bit','cursor','date','datetime','datetime2',
'datetimeoffset','float','geography','hierarchyid','image','int','money','real',
'smalldatetime','smallint','smallmoney','sql_variant','table','time','timestamp',
'tinyint','uniqueidentifier','varbinary','xml') AND
TABLE_NAME='CreditCard' OR
CHARACTER_MAXIMUM_LENGTH < 1 AND
DATA_TYPE NOT IN ('bigint','binary','bit','cursor','date','datetime','datetime2',
'datetimeoffset','float','geography','hierarchyid','image','int','money','real',
'smalldatetime','smallint','smallmoney','sql_variant','table','time','timestamp',
'tinyint','uniqueidentifier','varbinary','xml') AND
TABLE_NAME='CreditCard' ORDER BY COLUMN_NAME;
The last set of server side filters we’ll apply take advantage of the weak pattern matching available in MSSQL to eliminate cells that don’t match know card formats.
SQL- Apply MSSQL pattern matching
/* create temp table with appropriate columns and data types */
CREATE TABLE #dataloc (RowNumber INT IDENTITY(1,1), "CardNumber" nvarchar(25));
/* populate temp table with data that matches payment card formats */
INSERT INTO #dataloc
Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE
('%4%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%')
UNION Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE
('%5%[1-5]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%')
UNION Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE
('%3%[47]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%')
UNION Select "CardNumber" FROM [Sales].[CreditCard] WHERE "CardNumber" LIKE
('%6%[05]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%[0-9]%')
This SQL copies matching rows to a temp table and adds row numbers for later use.
Local Validation Testing
We now have a temp table filled exclusively with rows of data containing 15-16 digits that loosely match known payment card patterns.
It’s time to start pulling content over the wire for local processing. The row numbers assigned earlier will be used to break the dataset into chunks.
SQL – Querying potential card numbers
SELECT * FROM #dataloc WHERE (RowNumber >=1 AND RowNumber <=4000) ORDER BY RowNumber;
Now that we have full cell data, it’s time to use regex to extract all potential payment card numbers. It’s entirely possible that some cells will contain multiple payment cards.
If you’re familiar with regex, you may have noticed that these patterns exclude matches that are immediately prepended or followed with a digit. Although this will ultimately result in some false negatives, extensive real world testing has shown that these matches are almost always false positives.
The matches extracted using the above regex are then subjected to Luhn validation checks. Any regex match that fails the Luhn check is immediately thrown out.
Testing has shown that this approach has the potential to provide higher accuracy when compared to a keyword search. It also enables us to locate payment card numbers in unexpected places such as free form notes fields, or repurposed legacy columns, but this accuracy comes with a price. Database performance loads and scan times are significantly greater than those generated by the keyword search offered by PowerUpSQL making the best approach dependent on your specific use case.
In the next few weeks I’ll be releasing a tool that serves as a proof of concept for the search method discussed in this post. In the meantime, if you’d like to read more about locating sensitive data with Scott Sutherland’s PowerUpSQL, you can do that here: https://blog.netspi.com/finding-sensitive-data-domain-sql-servers-using-powerupsql/
Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.
Name
Domain
Purpose
Expiry
Type
YSC
youtube.com
YouTube session cookie.
52 years
HTTP
Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.
Name
Domain
Purpose
Expiry
Type
VISITOR_INFO1_LIVE
youtube.com
YouTube cookie.
6 months
HTTP
Analytics cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.
We do not use cookies of this type.
Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.
We do not use cookies of this type.
Unclassified cookies are cookies that we are in the process of classifying, together with the providers of individual cookies.
We do not use cookies of this type.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies we need your permission. This site uses different types of cookies. Some cookies are placed by third party services that appear on our pages.
Cookie Settings
Discover why security operations teams choose NetSPI.