Família Dammous  
English version English version   Home | Projetos | Artigos | Cadastro | OnLine | Sumário | Contato | Chat Terça-feira, 16/Abril/2024
Conecte-se  /dev :: home :: articles :: statisticus - part #5   
 Música
 Notícias
 Meteorologia
 Genealogia
 Astrologia
 Nutrição

 » Adiciona Favoritos



3 usuários online »»» 3 convidados - mais... Lista de Membros 1544 membros

Statisticus, Why and How?
Part V - Data Normalization

by Hakan Eskici

5. Data Normalization

We need to find a way to minimize the storage for our data. By analysing the raw data inserted into the database, we can see that some field values are repeated many times. For example, Path field values only includes the page names in our site. But the records in the Stats table have many of repeated path names.

So let's give unique ID's for each path.

Let index.asp has the PathID of 1, so when there is an access to index.asp to be inserted into our Stats table, we will insert the ID 1 instead of page name.

Now we have a PathID field instead of Path field. PathID field will have unique numbers each corresponding to a different page name.

Here is a question now. How are we going to know which path name corresponds to which PathID? We will have another table called Paths, having the fields PathID and PathName.

For our example;

Table Paths: PathID PathName ------ -------- 1 index.asp 2 page1.html 3 login.asp . ...

By linking the Paths and Stats tables togetger, we will reduce the number of bytes to store the same data.

Table Stats: StatID PathID ... (other fields) ------ ------ 1 1 2 3 3 1 4 2 5 1

So we see that for stat record #1, the requested path ID is #1 which corresponds to index.asp.

Now here is another question. How are we going to manage the table Paths if we have lots of pages? We won't! We will add some more code for automatic management of Paths table.

The idea is simple, whenever there is a request for a page, we will first look for its name in the Paths table. If the page has not been requested before, we will add a record for it and get the newly insterted record's PathID for a reference. If the page has been requested before, we already have a PathID for it.

We will write a function called GetIDPath for this purpose;

Function GetIDPath(sName) 'Construct SQL Query sSQL = "SELECT PathID, PathName FROM Paths" sSQL = sSQL & "WHERE PathName = '" & sName & "'" 'Open the recordset rs.Open sSQL,,,adCmdTable if rs.RecordCount = 0 then 'If there is no record, add one rs.AddNew rs("PathName") = sName end if 'Whether it's found or added, return ID GetIdPath = rs("PathID") rs.close end function

We will use similar functions for Browser, Referer, Color, Screen resolution and Operating system. These functions are almost similar to each other except that the tables they access differs.

But let's first make another improvement to our database structure by adding one more field to each of the Browsers, Referers, Colors, Resolutions and Operating Systems tables.

The field is called "Total" which will store the number of times accessed for a specific record. Therefore, we will have a easier way to learn the access statistics.

Although this could already be done by executing a SQL query which has COUNT function; using Total field method presents less overhead for execution time.

Now, our GetIdPath function will look like:

Function GetIDPath(sName) 'Construct SQL Query sSQL = "SELECT PathID, PathName FROM Paths" sSQL = sSQL & "WHERE PathName = '" & sName & "'" 'Open the recordset rs.Open sSQL,,,adCmdTable if rs.RecordCount = 0 then 'If there is no record yet, add one. rs.AddNew rs("PathName") = sName end if 'Increment the total access rs("Total") = rs("Total") + 1 rs.Update 'Whether it's found or added, return ID GetIdPath = rs("PathID") rs.close end function

We are almost ready for our Statistics application. We collected a bunch of data from the visitors, now it's time to analyse it to extract useful information.

In the next part of the article, we will discuss the methods to convert data into information.

Part I - Introduction
Part II - Measurement Tools
Part III - Simple Statistics
Part IV - Adding More Fields
Part V - Data Normalization
Part VI - Data Gets Valuable




 Linux
 Programas
 ASP
 Jogos
 Cartão Postal

Frase
O aumento da sabedoria pode ser medido com exatidão pela diminuição do mau humor.
Nietzsche



(avançado...)

IP Address Location

         [ processado em: 6,97 segundos.]  privacidade | licença | © 2001, Dammous.