Tuesday, April 12, 2005

More FTP... MORE!

Well I've recently been asked to summarize our ftp usage, so that we can decide how expensive it would be to move to a two-tier encrypted FTP solution. It's mostly to do with government regulations and securing data for audit purposes. (Not that securing the data for client purposes is enough of a motivation...)

So, as I've mentioned somehwere, I log our FTP traffic to a SQL server. It runs a nightly import job to concatenate the log files and import them to the database. I could theoretically use ODBC logging directly to SQL, but I prefer the W3C logging format that IIS offers. Dump that into SQL and you have a nice, flexible DB.

With a few exceptions, that I recently discovered.

The problems I ran into were when I decided to start doing some serious reporting. I found out that I had been importing most of the fields as character data. So asking it to average out or sum up the number of transmitted bytes over a week period errored out. You can't aggregate character data, after all. That and doing calculations by date or time yielded the same results. So I had to manually alter the database columns to reflect the correct data type before I could import.

Well, I'll work on that part later. For now, here's how I get the data in.

First, The DTS package runs a batch file that does the following:
----------
del D:\admin\ftplogs\*.bak
copy D:\admin\ftplogs\*.txt D:\admin\ftplogs\*.bak
copy C:\winnt\system32\logfiles\msftpsvc1\ex*.log D:\admin\ftplogs\ftp_dmz1.log
copy D:\admin\ftplogs\*.log D:\admin\ftplogs\full_log.txt
del D:\admin\ftplogs\*.log
----------
Next, we do a 'TRUNCATE TABLE FTPLOG" to clear the table, then import the concatenated log file (called full_log.txt) into SQL using a data transform. Copy Column for Column.

Finally, we need to clean a few things up. This works for me, for now:
----------
delete from ftplog where datestamp like '#%'
delete from ftplog where UTCTime = 'Microsoft'
----------
That clears out the rest of the errant data in the logs. Now that it's clean, we can work on getting the data we need out of it. Remember, I'm looking for three things. Date, number of files transmitted that day, and total traffic in MB. Here's how that works:

First, I created a table like this:
----------
CREATE TABLE ftpcalcs (
datestamp datetime NOT NULL,
filecount int NOT NULL,
mbytes int NOT NULL
)
----------

Remember, since integer and datetime columns have a preset length, you don't have to specify the length. If it were a character field, then you'd have to set the length. For W3C date field logging where the date is a 4 digit year, dash, two digit month, dash, two digit day, the type would be "char(10)" instead of "datetime".

Now to populate it, we run this:

----------
truncate table ftpcalcs
go

declare @myday datetime
select @myday = '04/19/2004'

while @myday < '04/12/2005'
begin
insert into ftpcalcs
select datestamp,
count(*) filecount,
((sum(csbytes)/1024)/1024) mbytes
from ftplog
where datestamp = @myday and
csbytes != 0
group by datestamp
order by datestamp

select @myday = dateadd(dd,1,@myday)
end
go
----------

That takes each day in turn from the beginning to the end of the date range, which must be altered manually. For each day, run that select and insert the result into the ftpcalcs table. Then you can just run a select on the ftpcalcs table for whatever data you want.

A couple of things I learned while doing this. If you're going to use aggregate functions in a select statement, you can either use them all by themselves, or if you want to use non-aggregate data, you'll need to group by the non-aggregate column, and order by that column as well. Otherwise, you error out. Also, aggregate functions, at least in this case, don't like INT data types. Numeric or Decimal worked fine when I was pulling from the FTPLOG table, but INT would return overflow errors every time. No idea why.

There you have it. Enjoy, Play, Have Fun. And if you improve on this, EMAIL ME, PLEASE! I need all the help I can get...

Thanks.
-D.

Due to the fact that I'll be damned if I can get blogger to display this frikken code properly, I've cross posted this at Maxwell's Alley.

No comments: