Business Intelligence Analyst Interview Project — what you can expect (OLTP, Python ETL & OLAP)
Some years ago, as I neared the end of the computer science accreditation I was lucky enough to interview for a BI Analyst position with a leading social media management company based in my home city of Vancouver, Canada — I’m sure you can figure out ‘WHOOO’.
The interview was for an intermediate Business Intelligence Analyst and, not having any experience but a keen interest in the topic I thought, “hey, why not!?” I went in with the right attitude and was happy to have a sit down with a few key players “WHOO”, to my surprise allowed me to complete a small project, their thinking being similar to mine, “hey, why not!?”
Unfortunately, I did not get the job — but as I said this wasn’t so much that intention, I got some good real world interview experience prior to my graduation and learned what I might expect with respect to interview-type take home projects.
Funny enough I completed the project over that Christmas break and, some 7 years later over the Christmas holidays, I decided to revisit the project and rework the areas that (surely) needed improvement.
I have uploaded my PDF report to GitHub including recommendations and an explanation of the task at hand, I did not update this report and was happy to find it was clear and had some good insight (for a student).
The scripts however, required an update — which is exactly what I have done.
Initially I leveraged mySQL, Apache, phpMyAdmin Python (2) and the MySQLdb library for database connection; first things first I got to work getting things updated to Python3, this required an updated MySQL driver (driver used loosely here). I also reworked the code — you can find the updated, but far from complete code HERE, see the initial commit for a starting point.
I will walk you through what you’d need to run the project, and then what I changed. This article covers the assignment expectations, OLTP AND OLAP DBs w/ diagrams, the Python scripts and the report itself.
My hope is that this gives some candidates an idea of what they may expect from BI Analyst interviews and sample projects.
Follow this link to get the mysql-connector (Python Driver for MySQL — Connector/Python) select your operating system, in my case this was MAC. Install via the following:
sudo python3 -m pip install mysql-connector
Once installed you will have the ability to import and connect to a MySQL db.
If you have a Database, or once you have created your database you can perform a similar, db specific, connection:
Now I had originally included the SQL dump to create the initial db, with dummy data, this can be found in GitHub BUT I have since updated my script to run this database creation auto-MAGICALLY when you run Main.py — this is something I should have completed in my original submission. See our function for initial database creation from sql:
Database & Data warehouse
As my report makes clear we have a requirement to store data in a relational database. Tables I decided to implement are as follows:
Users, Location, Session, Plan, Project — this data will supply a nice sample for ETL and metric gathering, and allow for expansion/improvement and recommendations.
Extended Model: My original db would allow for additional app features, and the subsequent data generated from use. I diagramed an extension of the initial model which suggests ‘Projects’ should allow for ‘Tasks’ and ‘Comments’ on said ‘Tasks’ (think Jira). These simple additions would provide engagement data as well as indication of task completion RATE, similar to burn-down, which is far more insightful with regard to incremental progress; a 'Project' may takes months to complete and we would be left with little to no data with regard to the progress between ‘Project’ creation and ‘Project’ completion.
I think its always a good idea to go beyond the assignment requirements and add suggestions as to where you could go; always keep in mind how you might allow for the addition of new features or ‘scale up’ in the future — this is a VERY real situation in the workplace; not only does this show that you are able to relate the data and assignment to its real world application, it shows you were considering the future in your initial designs.
ETL to Data Warehouse
ETL is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s) — wikipedia.org
Due to the simplicity of our database I chose the Star schema for our data warehouse/datamart. We have a single fact table and only 4 dimension tables. This simplifies the data (Joins) and allows us to easily search for the data points we care about.
The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data — wikipedia.org
Our transformation is rudimentary in this case, see requirements, this means we have done the near minimum with regard to transformation or data cleaning (this project was under a very short timeline). We have, as a small example, separated our dateTime into Month, Day, Year and joined our tables as need be.
Python performs our migration of data to the data warehouse without use of any libraries, looking back this was likely a mistake on my part. You should always, unless explicitly asked to do otherwise, leverage existing tools and libraries; ‘re-inventing’ the wheel goes against Computer Science principals of efficiency and opens the door to well known points of failure, it also wastes time that could be better spent elsewhere.
In my original report I do make some recommendations for the App in accordance with my extended database model. Again you can find the full report HERE for review.
Now, the code was not initially abstracted well at all, as you can see:
Essentially, everything including my SQL was in a single Python file.
I have addressed this prior to pushing my, somewhat updated, Python script. Here you can see the updated code, abstracted for simplification and elimination of repetition (DRY) - it’s much simpler:
To be clear I can say with confidence I made a few mistakes in my original attempt at this project, I have incorporated some of the feedback I received here:
- I did not script for initial db creation, rather I supplied a db.sql sql dump for manual db generation — you should not assume a reviewer has the time or skill to preform any setup, not only that but this instantly leaves the impression that we a) lack the skill or b) are too lazy to follow up with the details.
- I failed to abstract my code well — a major red-flag for reviewers.
- I did not leverage existing tools and libraries (Pandas) — this is wasteful and error prone; I could have better spent my time improving or elaborating on the project had I properly leveraged existing libraries.
- I did not include a robust ETL process with extensive examples of transformation/cleaning — this is very important to the ETL process, you may have heard of the 80/20 dilemma; by largely ignoring this part of the assignment (although not specified in the criteria) I have shown my ignorance with regard to what is important.
Now for the most part I have corrected 1 & 2, I expect I may revisit this project in the near future. For the time being I hope you enjoyed seeing what kind of interview project you might expect while interviewing for an intermediate Business Intelligence Analyst and what may be some common pitfalls to avoid.