Day 2: Learning Everyday

2 min readJan 31, 2022

Today, I continued with my Coursera course on SQL. Other than that, I started listening to the podcast Towards Data Science and heard Joel Grus talk about the downsides of using Jupyter Notebooks. I was a little shocked to hear that he does not think it should be the default coding environment for a data scientist, however, when he explained the problem with reproducibility and hidden states it made sense. Particularly in research, these are key aspects that go towards making a robust model with solid results.

Another aspect of data science he talked about was unit testing. Trying to take a chunk of data and run small tests to ensure your code is working according to plan. This could potentially save hours if there is a bug in the code and it takes many hours to train the model. I will look for ways to try and implement this in my current projects and future work.

Summary — Filtering & Sorting

Why? To retrieve a specific subset of data, reduce the number of retrieved records, increase query performance and reduce strain on the client database
WHERE

SELECT column_name
FROM table_name
WHERE column_name operator value;

Can use IN, OR, BETWEEN, AND, NOT as well for filtering
IN vs OR — IN can contain list of multiple objects, faster to run, can contain another SELECT, be careful of order with OR
Wild cards — %value, ‘_value’ (for some), ‘[]’ specific char, can take longer to use a wildcard compared to =/<, placement is important
ORDER BY — can order by column not selected, multiple columns, must be last in query

SELECT column_name_1, column_name_2
FROM table_name
ORDER BY column_name_1, column_name_5, … ASC|DESC; (choose)

Day 2: Learning Everyday

Summary — Filtering & Sorting

Written by Kirsty H