Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

sql for sampling data from tables

Some time ago, I had the requirement to sample data from a table to do validation work.  Basically, we wanted to validate syncronized data with a base table, but could not afford to compare every row.  We had to come up with a way to ‘sample’ the data.  Each table we had to sample had a different number of rows and in addition to that were growing.  The mechanizm had to be simple, yet take into account table size.  I did not want to do a custom query for every table.  I wanted something somewhat generic – something that I could say check ‘x’ rows from this table, not check every ‘nth’ row which would mean that the result would grow over time.

The obvious first consideration was to number the rows and do it from that, but considering we were doing this in a program, I needed something a little more elegant.  Here is the sql I came up with.

SELECT E.* FROM
(SELECT ROWNUMBER() OVER (ORDER BY EMPNO) AS ROWNBR, EMPNO FROM EMPLOYEE ) TEMP,
(SELECT INTEGER(ROUND((COUNT(*)/23.0),0)) AS HOWMANY FROM EMPLOYEE) TEMP2,
EMPLOYEE E
WHERE MOD(ROWNBR,HOWMANY)=0 AND E.EMPNO=TEMP.EMPNO

So here is what we do,
TEMP is an ordered list of the table key – we number each row and order by the empno (or whatever key)
TEMP2 is a cartesian join of the count of the table and how many rows you want.  In this case it is 23 rows
E is the table we want and join to the above
The where clause is the key – we only take rows that have a modulus of 0 when you devide the rownumber by how many rows you want

Notes:

  1. This was written for the sample database employee table, however I did have to modify it to have more than 42 rows
  2. Small tables will have inaccurate results just because of the math so this works best on larger tables
  3. The number of rows returned is not exactly what was requested.  Once again because of the math, but it is usally close.  I found the larger the table, the more accurate the result.  You can add a ‘FETCH FIRST n ROWS ONLY’ (where ‘n’ is your value) so you don’t get more than you want.
  4. How often do you find a legitimate use for a cartesian join?
  5. Don’t ask for more rows than are in the table – this can cause errors with the mod function

Leave a Reply

Buy Generic Viagra USA
Where TO Buy Viagra Online
Buy Viagra Where
Cheap Cheap Viagra
Line Viagra
Buy Discount Viagra
Cheapest viagra online Pills
Buy Viagra Sample
Cheap Viagra Online
Cheap Viagra FAST Shipping
Viagra for Sale
Buy Viagra New York
Cheap Websites For Viagra
Viagra FOR Sales Online
Cheapest Viagra Prices
Cheap Viagra Pills