Hello!

I'm Dr William Nicholson

Hello!

Data Scientist, Project Manager, Machine Learning expert

IBM Machine Learning Professional (2022), AWS Cloud Technician, GCP Cloud Digital Leader

PRINCE2® Practitioner

>10 years experience across academia and industry

Skill Sets

Articles by topic

Articles and coding demonstrations covering topics of data science and analysis that reflect my prominent skill sets and career interests

Most recent articles

The six most recent blog posts are shown below. For the rest, please see the section just above. Thank you!

Post 2 Image

Topic Modeling with BERTopic and DataMapPlot

Topic modeling is a form of text analysis that uses unsupervised machine learning to identify patterns, themes, clusters, and groups across a collection of documents. In this article I discuss using the powerful BERTopic library alongside quantized large language models to identify themes and topics from a collection of research papers at the intersection of artificial intelligence and ophthalmology. Then, we'll use the DataMapPlot library to produce a publication ready visualization of the thematic structure contained within the abstracts of those research papers.

Post 2 Image

Loading data into Google Colab

Google Colab (or Colaboratory) is a complete, modern, cloud-based runtime environment. It gives individuals and teams the ability to work together on coding, data science and machine learning problems with shared access to data, state of the art GPU's and TPU's, and industry standard Python libraries. You are provided with an executable document, i.e. a notebook much like a Jupyter Notebook, that allows both code and markdown to be written and executed, and your results visualized. In this article I'll cover one of the most fundamental requirements all teams will face - how to get data into your Colab notebook in the first place!

Post 2 Image

Database Normalization

Even a good database design can't always protect against bad data. But there are plenty of occasions when a good database design helps us avoid the bigger of the possible bad data - database headaches. Thus, in this article I'll discuss database normalization: what it is, why we do it, and how we do it. I'll also discuss how we can determine when a database table is normalized 'enough' and what indications we can look for that suggest bad data might pose a problem for our database.

Post 2 Image

Getting started with Amazon Web Services

Amazon Web Services (AWS) was the most popular Cloud provider available through the first quarter of 2022, controlling 33% of the entire market; beating Microsoft Azure and it's 21% share. Both organizations and individual developers use cloud services from AWS, Microsoft, and other vendors for machine learning, data analytics, cloud native development, application migration, and many other services. In this short article I'll discuss how to setup your own AWS account, as the root user, then how to create your first IAM admin user, before finally introducing some of the main services that standout as the most useful through a data scientists career.

Post 2 Image

Integrating Django with Tailwind CSS

Django is one of the most popular Python full-stack web frameworks available. It's high-level design makes rapid development of web apps easier and cleaner while requiring less code. TailwindCSS is rapidly becoming the first choice CSS framework for styling modern websites. It's utility-first approach makes creating beautifully styled apps with consistent choices of colour, spacing, typography (and everything else CSS) far easier for a great looking website or app. In this article I'll show you how you can combine these two frameworks so that they work together without missing out on monitoring development changes with both npm and Django development servers.

Post 2 Image

Rapid Data Visualization and Interactivity with Streamlit

Streamlit is an open-source Python app framework that allows you to rapidly turn your data into an interactive data visualization - all in pure Python code. The apps you create are custom, fast, beautiful and very easy to share. What's more is that Streamlit seamlessly integrates with existing popular Python data visualization tools such as Matplotlib and Plotly Express. In this article we'll take some Twitter data and explore it in the context of building a Streamlit interactive data visualization.