Thursday, September 20, 2012

Thinking About Databases

Some years ago I had students write a program that involved managing a checkbook and balancing it. I was so proud of it. Later a student told be she liked the programs that were “relevant.” I replied “like the checkbook program?” She shook her head and said “no, like tic tac toe.” I was devastated but I think I hid it well. I realized that relevant to her was not the same as relevant to me. She didn’t have a checkbook or write checks. This was meaningless to her. I might have been better off with some sort of credit case program but I’m not sure that would have worked either. Fortunately today there are database applications that our students use and that are relevant to them. Instant message programs for one and Facebook for another.
These are big databases. Huge! Most people don’t even think about how they work behind the scenes. For a computer science student though they make great thought exercises. Take your average instant message software.
Obviously there is a database that tracks who ones contacts are. What happens when you log into the software. Probably some thing like this:
  1. Verifies username and password. Is the date encrypted? How is it passes safely?
  2. Changes your status to online.
  3. Checks the database for a list of your contacts and looks at the status of each of their records.
  4. Builds your initial display of contacts
Simple enough. What happens when someone has a status change? Several possibilities exist. One is that perhaps the software merely changes the status on the user’s record. Then perhaps each user’s program periodically checks they status of all their contacts and changes the display. Another is that besides setting the status the software also checks the status of all of the users contacts, finds which ones are online and notifies each user.
These options (and there are probably others that people can think of) have pros and cons. What are they? How is performance different with each option? Which way scales best (works best for larger numbers of contacts?) There is a lot to discuss. I don’t know the answers but I’ll bet a discussion among several people would be instructive. Can you write a simulation? How would that work and how would you use it to test your theories?
What about Facebook? How might that database work? There you also have the fact that each item also has information about who can see it. How is friends of friends handled differently from just friends or from public? If you have 500 friends and each of them have 500 friends who many people can see an item? What about duplicates? Do duplicated play into things and if so how?
At the university level most computer science majors will take formal course work in databases. One, two or more courses. Most often we don’t have time for that in high school. In fact many times there isn’t time for more than a discussion of databases. It’s a reality. But I think that students can learn a lot from discussions of databases around data applications that they know and love. The technical issues are fascinating (well they are to me) but I also think that students can get value from discussion the ethical and social impact of database applications. Topics like encryption of data (passwords if nothing else) and how all this data is used are natural discussion points.
Privacy and the use of data for marketing purposes on Facebook is a huge topic among those people concerned with data privacy. Our students even aware of the issue? I think that a well educated CS student should at least be exposed to the issues. What do you think?

No comments: