Blog*

The ramblings & musings in the amiva.cc sphere

In my ever failing quest to pick up as much knowledge as possible, it came to my attention that the DataTalks.Club's Data Engineering bootcamp has started up again. It's a 9ish week long programme aimed at getting you on board with data engineering in Python, with things like...

  • Setting up Docker, Cloud Providers, Terraform
  • The whole ELT process using
    • DBT
    • Prefect
    • Spark
    • Kafka
  • Data Warehousing

All capped off with a 3 week project producing a full set of pipelines with a fancy dashboard at the end.

Now, I did start this last year and essentially dropped it after a week or two – mainly due to having a newborn and no real free time because of it. But this year I'm in a much better position.

Week one kicked off this Monday just gone – with a nice introductory video found on their YouTube which introduced the main players, went through all of the entry points (github, slack, and whatnot), and basically what the next few weeks will look like.

Following that, on their github is all of the course materials, which consists of videos, reading, activities/walkthroughs, and some homework for each week; which can be submitted & linked to your personal github repos for points.

The big sell to me for DE Zoomcamp is twofold – initially, having all of this all put together in a form where there is a group of people to contact for help (whether the providers themselves or other participants) is really helpful, and secondly; which is new for 2023 is the stack they're teaching. Prefect & DBT are buzzing at the moment, and has formed parts of plans that I've been drafting in my own work.

So here I am, all signed up, watching the first set of videos, setting up my laptop, and getting the homework out... And to get more points, we have this post!

Week 1

The basics of week 1 is get everything set up as well as the basics of Docker, I'm not going to delve too much into this because, well if you want to know about it, delve into it yourself. But I can now say I have docker set up, I have a container running Postgres with data materialised on my local disk, and another container with a simple pipeline of taxi data to said database, along with relevant Dockerfile and docker-compose.yml files. As well as getting Terraform set up on Google Cloud Platform.

Overall, it was incredibly clear what was needed to do, and any issues that cropped up were easily resolved either by the reams of help on the slack, or the group faq document, or by Dr. Google.

Safe to say I'm all prepped for next week

#Python #DataTalksClub #DEZoomCamp #Docker #Terraform #Bootcamp #Learning

One of the perks of working for the NHS is the availability of free access to a number of technical groups and sometimes even training. This is where the HSMA 5 programme comes into play.

The HSMA is a programme that teaches staff in Healthcare and other civil services python & data science skills that originated in the South West; this was opened up to the rest of the country when COVID hit us, from October it'll be the 5th iteration, and it's the one that I've been accepted on.

Something to help with this is a nice little bit of rubber ducking. So here we have part one of the a series of ... some amount, revising what I've been taught. I'll be summarising the week's lessons, along with thoughts and some experiments and what not, hopefully to reinforce everything and also share the knowledge further on.

I'm also planning on streaming sessions of me doing stuff over at my OwnCast instance and I'll find somewhere to host recordings too.

Cheerio!

NikCodes

#Python #HSMA #RubberDuck #Learning