Tuesday, 12 May 2009

Database issues

The “Operation” table has got 7531 cases, each of them representing a single patient.

The “Follow-up” table has got 7431 cases. However, a patient may have had several visits that were recorded into the database. After a brief analysis of the data, we find that the table only contains information on 3703 patients.

In order to generate pathological tables, variables from both tables have to be used, resulting in linking the two tables. To do so and to avoid duplication of patient records (a patient should only be represented once in the dataset), data from patients with multiple entries is checked and input once in a new table. At the end of this step, the new table contains over 3500 records.

It is also important to remove all records where the variables of interest are set with values that can not be treated. For example, a gleason score greater than 5, or a PSA extremely high (greater than 400). Another issue that apperared is that the ctstage of some patient has been input as "2". This makes sense but unfortunately cannot be used in our first study as ctstage "2" should be divided between "2a","2b" and "2c" to perfectly match the conditions of past studies. (These records can still be usefull for other study).

Finally, to be in the same condition as the last update of the Partin tables (2007), records with a Gleason score under 5 (23) will be excluded and so will be those with clinical stages of "T1a"(18), "T1b"(43) and "T3a"(27), represented another loss of 106 records.

The new table which is aimed to be used by our algorithm, contains now only 1757 records.

No comments:

Post a Comment