Friday, October 2, 2009

Strategy for Understanding Data in a Database

I recently started in a new company where a database does not have any data dictionary at all. I'm now doing a brute force work where I am trying to create my own data dictionary but I was wondering what are your strategies in learning the data inside a database?


Ben Krug said...


In my experience, you can usually find one old-time programmer who can tell you what every detail in the database is for and how it got there. (Assuming someone designed it, and it wasn't all created using GUI tools or the like.)

Hope you can!

Ben Krug
MySQL Support Engineer
b e n at m y s q l d0t c o m

Ben Krug said...

Hi again -

forgot to say, also, using a tool to reverse-engineer the database is the first step, so you can make nice E-R diagrams.

Sheeri K. Cabral said...

I agree with Ben -- MySQL's OSS Workbench will reverse engineer databases, either live or from a "mysqldump --no-data" script.

Foreign keys are a big help if they exist. But usually tables and data are named somewhat well -- the "user" table probably contains user data, for instance.

You can also go through the slow query log (or the general log!) with a tool like mysqlsla or mysqldumpslow and look at the most frequent queries, trying to figure out what they do, etc.

Sometimes size can help -- for example, if there's a table called "user" with 4 rows and a table called "users" with 4,000,000 rows, probably the "users" table is the one that contains one row per user.

Lee said...

I'm glad to hear that there exists no clear consensus. I was faced with a similar task earlier this year, and discovered no single technique that I'd recommend to others. One thing I _did_ discover was that the old-timers you ask will have forgotten much of what they once knew, but will nonetheless make up something that sounds plausible. Thus, you will need to confirm EVERYTHING they tell you. So get used to writing Stored Procedure "sanity checks".