My name is... well, you can call me "Memorious." I'm a software developer, and I spend most of my days maintaining data warehouses or building websites or some sort of web application from somewhere in Brooklyn.
I'm an ecommerce developer, but I also like building android apps whenever I get a chance, which is not often, but I try, and lately I've gone more deeply into Data Warehousing; mainly on PostgreSQL implementation somewhere on the cloud.
If you're wondering what this site is about, well, this is where I test the modules I build, and, sometimes, when I'm not being lazy, I manage to force myself to write about interesting findings I come across during my days developing. I will also try to release some of the modules I build because I'm a glutton for punishment, and I can't wait to hear everyone's positive feedback about them :D.
P.S: "this work is solely my own and not in collaboration with any one of my employers. yadda yadda yadda"
larousse - Data dictionary webapp.
Redsfhitusing an ETL tool called
Matillion; while the entire ecosystem works splendidly, I kept running into the same inefficiency: lack of data cataloguing.
Data cataloguing tools, like
Alation, should be an essential piece of every data warehouse, but they’re really expensive, so most teams -- burdened with other, more relevant, data warehouse costs -- decide not to use them. After all,
Tableauand all other BI tools are also very expensive and much more necessary for, well, the business to run.
You can think of data catalogues as “Data Dictionaries.” They provide data analysts with relevant bits of information that will otherwise take days and even weeks to be found. Often enough, whenever a new person needed to create some (
Tableau) reports they came to me to ask about tables contained in a schema, and about what each of the columns in said table contained.
Questions like “what does that column mean?” or “which report is using this table?” and “what was joined/transformed to create this table?” were commonplace in this project. I became a bit frustrated with those questions because there are literally hundreds of tables and views in this data warehouse, and I don’t necessarily know what each of them contain… or why they’re even there in some cases. As a Data Engineer I’m much more concerned with data extraction and availability, not analysis! Efficiencies and inefficiencies in my ETL pipelines haunt me.
This frustration led me to suggest building an internal dictionary. Their own internal data catalogue. The approach was to start with the basics. Display all schemas, display all tables within those schemas and finally display all columns, descriptions, statistics, query samples and data samples of those tables… This should be sufficient to get analysts started within a few hours.
The backend for this webapp, I decided, would be
DynamoDB. A table in there named “data_dictionary” would hold the data warehouse meta data and user-entered table descriptions.
Why didn’t I think of this before!? It's a start! Next up:
MongoDbas backend and
At the beginning of 2014, after doing some minor
Android development, I found myself no longer willing to do
Magento development in the way most agencies and developers were doing it (I.E. dragging and dropping with impunity). My frustrations only increased every time I found myself digging through previous
Magento builds just to pull out features, to add into other ongoing projects. It felt unclean! The processes in place for
Magento development we had back then turned me into a code scavenger, and oftentimes the scavenging didn’t go so well. I complained more than ever to the poor souls willing to listen! Something had to change.