Day 2: Learning Everyday

Kirsty H
2 min readJan 31, 2022

Today, I continued with my Coursera course on SQL. Other than that, I started listening to the podcast Towards Data Science and heard Joel Grus talk about the downsides of using Jupyter Notebooks. I was a little shocked to hear that he does not think it should be the default coding environment for a data scientist, however, when he explained the problem with reproducibility and hidden states it made sense. Particularly in research, these are key aspects that go towards making a robust model with solid results.

Another aspect of data science he talked about was unit testing. Trying to take a chunk of data and run small tests to ensure your code is working according to plan. This could potentially save hours if there is a bug in the code and it takes many hours to train the model. I will look for ways to try and implement this in my current projects and future work.

Summary — Filtering & Sorting

  • Why? To retrieve a specific subset of data, reduce the number of retrieved records, increase query performance and reduce strain on the client database
  • WHERE

SELECT column_name
FROM table_name
WHERE column_name operator value;

  • Can use IN, OR, BETWEEN, AND, NOT as well for filtering
  • IN vs OR — IN can contain list of multiple objects, faster to run, can contain another SELECT, be careful of order with OR
  • Wild cards — %value, ‘_value’ (for some), ‘[]’ specific char, can take longer to use a wildcard compared to =/<, placement is important
  • ORDER BY — can order by column not selected, multiple columns, must be last in query

SELECT column_name_1, column_name_2
FROM table_name
ORDER BY column_name_1, column_name_5, … ASC|DESC; (choose)

--

--