Coding Session: Mean Age of Patients

The topic of my patient demographics has come up on several occasions, and although I’m fairly certain of what my instincts are telling me, it’s really just a best guess. No more. Today I decided to waste a few hours and find out the mean and median age of my patients once and for all.

Accessing medical records at home is a privacy concern that I’d rather not contend with, so I’ll be using the next best thing: Google Calendar. I noticed one day that the scheduling system we use syncs to an iCalendar, which is then privately shared with my Google account. I use this iCalendar in Outlook to check my schedule, and so I thought I’d start by parsing this ICS file.

$ical_file = file_get_contents("ICS_URL");
$rows = explode("\n", $ical_file);

Now the $rows array contains everything in the ICS delimited by the newline symbol. By reading out the array row by row, here is a sample event out of thousands:

BEGIN:VEVENT
DTSTART:20161213T053000Z
DTEND:20161213T060000Z
DTSTAMP:20160914T115807Z
CREATED:20160914T041047Z
DESCRIPTION:
LAST-MODIFIED:20160914T060049Z
LOCATION:
SEQUENCE:0
STATUS:CONFIRMED
SUMMARY:[0850101]-李小龍-[2016/12/13 13:30:00~2016/12/13 14:00:00]-1050914042
TRANSP:OPAQUE
END:VEVENT

The events summary in the calendar all use the same format:

SUMMARY:[0850101]-李小龍-[2016/12/13 13:30:00~2016/12/13 14:00:00]-1050914042

So if we break this down, we get the 3-digit lunar birth year, birth month, birth date, full Chinese name, and then some other scheduling information. All of this is preceded by a tag separated with a colon. For mean age calculations, I need a database of all my patients’ birthdates without duplicates.

First parse all rows into its respective tags and contents.

$row_data = explode(':', $data);
$info[$row]['tag'] = $row_data[0];
$info[$row]['data'] = $row_data[1];

Where $data is the raw dump of the each row, and $info is the new array that contains the tag (SUMMARY) and the content ([085…). Once the SUMMARY row is encountered, the name is parsed.

$name = explode('-', $info[$row]['data']);

Then the birthdate is parsed.

$lunaryear = substr($name[0], 1, 3);
$birthday= substr($name[0], 4, 4);
$birthyear = (int)$lunaryear + 1911;
$birthdate = $birthyear . $birthday;

Since the original birthdates were in lunar calendar years, I had to do a pretty inefficient conversion to Gregorian. The last step is to put all this stuff into the MySQL database. Starting with a duplicate check followed by an insert.

$dupesql = $conn->query("SELECT * FROM patients_db WHERE name = '$name' AND bdate = '$birthdate'");
if ($dupesql->num_rows > 0) {
&nbsp echo "duplicate!"
&nbsp }else{
&nbsp $sql = $conn->query("INSERT INTO patients_db (name, bdate) VALUES ('$name', '$birthday')");
}

All that’s left is to export the SQL database into excel and bam! Histogram! Interestingly, I don’t recall seeing so many toddler patients, so maybe something funky’s going on in the database. But that’s another exercise for another time.