Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video, I go through the different types of binary classification metrics. These include: accuracy, prevalence, confusion matrices, sensitivity (aka recall or...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video, I go through the different types of binary classification metrics. These include: accuracy, prevalence, confusion matrices, sensitivity (aka recall or true positive rate), specificity (aka true negative rate), precision (aka positive predictive value), F1 score, and the areas under the precision-recall curve and the receiver operating characteristic curve, that is: AUPRC and AUROC. We close with how to implement these using the scikit-learn package in Python, going through a Jupyter notebook.
Code can be found here: https://github.com/RichardOnData/YouTube/blob/main/Python%20Notebooks/classification_metrics.ipynb
Patreon: https://www.patreon.com/richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video, I talk about SHAP values and how these can be used for explainable AI and explaining how features contribute to a machine learning's predictions for...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video, I talk about SHAP values and how these can be used for explainable AI and explaining how features contribute to a machine learning's predictions for each observation. These are great tools when your goal isn't (only) prediction, but is also inference - that is, understanding the most important features that influence a response or exactly one how feature, at different levels, influences the response. Note that this is different from causal inference, which is a separate and very complex topic altogether.
SHAP value documentation:
https://shap.readthedocs.io/en/latest/example_notebooks/overviews/An%20introduction%20to%20explainable%20AI%20with%20Shapley%20values.html
Patreon: https://www.patreon.com/richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 NOTE: Sorry about the bad audio quality on this one. I switched microphones when I upgraded phones recently, and thought during testing that it would be a lot better...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
NOTE: Sorry about the bad audio quality on this one. I switched microphones when I upgraded phones recently, and thought during testing that it would be a lot better than it was here. Looking into a REAL microphone upgrade here.
NOTE 2: I didn't talk about DALL-E on this one, which is another feature to GPT-4. The focus here was mostly on the type of use cases I assume most people are using ChatGPT for these days, as opposed to image generation.
In this video I discuss ChatGPT 4, and compare it to ChatGPT-3.5 which most people use by default. I make some comparisons, looking at the file upload feature as well as comparing the responses side by side, and then talk about who exactly should make the upgrade here.
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video I'll break down some tips that I have to get data jobs. This is going to be broad and apply to all types of positions, whether those are data analyst,...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video I'll break down some tips that I have to get data jobs. This is going to be broad and apply to all types of positions, whether those are data analyst, data science, or data engineering jobs!
To summarize:
1) Have good education in a field like statistics, computer science, math, engineering, business, or physics. The higher up you go, the better.
2) Get some experience. That might mean asking your network if they need help, or using Upwork or Fiverr and getting creative.
3) Tell good stories. Lay out: context, problem, data, solution, and impact.
4) Develop differentiating skills. I think there's value in data scientists getting more engineering skills and data engineers getting more analysis skills.
5) Slow down and think more about jobs you want. Don't apply everywhere; find jobs you're more interested in and give more effort to those applications and to honing the right skills for them.
AWS certifications: https://aws.amazon.com/certification/
Azure fundamentals: https://learn.microsoft.com/en-us/credentials/certifications/azure-fundamentals/
GCP certifications: https://cloud.google.com/learn/certification?hl=en
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video I show some ways I've used ChatGPT to both learn, and to data science faster. ChatGPT can be an excellent tool if you're responsible with it. It can...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video I show some ways I've used ChatGPT to both learn, and to data science faster. ChatGPT can be an excellent tool if you're responsible with it. It can provide great ideas to help get through creative roadblocks, as well as to generate great coding examples that you can turn around and use to learn. You DON'T want to become reliant and dependent on it to write code for you!
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 My thoughts on the data job market in 2024. I looked at data scientist, data analyst, data engineer, and machine learning engineer jobs. In particular we talk about...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
My thoughts on the data job market in 2024. I looked at data scientist, data analyst, data engineer, and machine learning engineer jobs. In particular we talk about some broader trends in tech more recently, the recent tech layoffs, and what hiring and salaries are looking like for these positions.
Crunchbase: https://news.crunchbase.com/startups/tech-layoffs/
InterviewQuery (December Update): https://www.interviewquery.com/p/december-data-science-job-market
McKinsey Tech Trends: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-top-trends-in-tech
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 NOTE: The beginning of this video is somewhat tongue in cheek. Certain things, you just have to let yourself have fun with. Some of the articles and videos I reference...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
NOTE: The beginning of this video is somewhat tongue in cheek. Certain things, you just have to let yourself have fun with. Some of the articles and videos I reference make very different points, specifically regarding the rise of data engineering and constructing end-to-end machine learning pipelines. Those are valid points that I don't want to be dismissive of, but overall, the subject for a different video.
In this video I specifically want to focus on and give my opinions to a commonly raised concern, that of ChatGPT and generative AI displacing data science and data analyst jobs. Here's my thoughts on the broader question of will data science die in 5 years, specifically as a result of AI and LLMs taking said jobs.
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video we're revisiting the R vs Python comparison in the year 2024. How do they stand in recent job reports and in indices like PyPL or the TIOBE index?
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video we're revisiting the R vs Python comparison in the year 2024. How do they stand in recent job reports and in indices like PyPL or the TIOBE index?
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video I talk about every data science job I've had, how each job was dramatically different from the others, and how each one sort of led to the next.
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video I talk about every data science job I've had, how each job was dramatically different from the others, and how each one sort of led to the next.
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video I'm going to provide a recommended 10 packages that you should know and focus on to get strong at Python programming, in the context of data science....
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video I'm going to provide a recommended 10 packages that you should know and focus on to get strong at Python programming, in the context of data science.
Recommended book "Python for Data Analysis": https://amzn.to/3cDXKcE
1. pandas
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
2. numpy
https://images.datacamp.com/image/upload/v1676302459/Marketing/Blog/Numpy_Cheat_Sheet.pdf
3. matplotlib
https://matplotlib.org/cheatsheets/_images/cheatsheets-1.png
4. seaborn
https://images.datacamp.com/image/upload/v1676302629/Marketing/Blog/Seaborn_Cheat_Sheet.pdf
5. datetime
https://www.pythoncheatsheet.org/modules/datetime-module
6. statsmodels
https://www.statsmodels.org/dev/examples/index.html
7. scikit-learn
https://images.datacamp.com/image/upload/v1676302389/Marketing/Blog/Scikit-Learn_Cheat_Sheet.pdf
8. streamlit
https://docs.streamlit.io/library/cheatsheet
9. tensorflow
https://zerotomastery.io/cheatsheets/tensorflow-cheat-sheet/
10. keras
https://images.datacamp.com/image/upload/v1660903348/Keras_Cheat_Sheet_gssmi8.pdf
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video I cover survival analysis. Specifically what it is, and why it's useful when the time until an event is important and when you have "censored" data. I...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video I cover survival analysis. Specifically what it is, and why it's useful when the time until an event is important and when you have "censored" data. I talk about what censored data is and provide definitions of the survival and hazard functions. This is illustrated visually by showing a Kaplan-Meier curve as well as the idea behind the logrank test. We close with modeling. There's a lot of different survival models, such as survival trees and survival forests. For parametric versions there's accelerated failure time models. I show an example of a Cox proportional hazards model and explain what exactly the (very important) proportional hazards assumption means.
Real world example: https://web.cortland.edu/matresearch/
Censored data example: https://blog.minitab.com/en/michelle-paret/the-difference-between-right-left-and-interval-censored-data
Kaplan-Meier Curve example: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_survival/BS704_Survival_print.html
Logrank test example: https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/BS704_Survival5.html
Cox model example: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_survival/BS704_Survival6.html
Packages for survival analysis:
Python: https://github.com/CamDavidsonPilon/lifelines
R: https://github.com/therneau/survival
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 ChatGPT: Bri Does AI: https://www.youtube.com/watch?v=MnDudvCyWpc Ryan Scribner: https://www.youtube.com/watch?v=X9ksiScY7hM Statistics: Duke:...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video, I walk you through how to set up your Python development environment. If you're a complete beginner, you'll probably be good with just...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video, I walk you through how to set up your Python development environment. If you're a complete beginner, you'll probably be good with just Anaconda/JupyterLab/Jupyter Notebooks. If you're going to be a serious developer, you'll want to use Visual Studio Code and as a best practice set up virtual environments.
Anaconda: https://www.anaconda.com/download
Visual Studio Code: https://code.visualstudio.com/download
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 'Journey to Become a Google Cloud Machine Learning Engineer': https://amzn.to/3TjwmYT Exam guide:...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
'Journey to Become a Google Cloud Machine Learning Engineer': https://amzn.to/3TjwmYT
Exam guide: https://cloud.google.com/learn/certification/guides/machine-learning-engineer
Github compilation: https://github.com/sathishvj/awesome-gcp-certifications/blob/master/professional-machine-learning-engineer.md
Medium articles:
https://towardsdatascience.com/a-comprehensive-study-guide-for-the-google-professional-machine-learning-engineer-certification-1e411db4d2cf
https://datacouch.medium.com/google-cloud-professional-machine-learning-engineer-certification-preparation-guide-2067478767ff
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1 In this video, I discuss statistician Edward Tufte's famous 6 principles of graphical integrity. Credit to Radhika R for the excellent LinkedIn Pulse article displayed...
Subscribe to RichardOnData here: https://www.youtube.com/channel/UCKPyg5gsnt6h0aA8EBw3i6A?sub_confirmation=1
In this video, I discuss statistician Edward Tufte's famous 6 principles of graphical integrity. Credit to Radhika R for the excellent LinkedIn Pulse article displayed for the majority of this video.
Articles:
https://www.linkedin.com/pulse/edward-tuftes-six-principles-graphical-integrity-radhika-raghu/
https://www.businessinsider.com/the-27-worst-charts-of-all-time-2013-6#welcome-to-fox-where-the-line-graphs-are-made-up-and-the-points-dont-matter-12
https://infovis-wiki.net/wiki/Data-Ink_Ratio
https://anilbas.github.io/teaching/hci/week13/Tufte.pdf
https://clauswilke.com/dataviz/no-3d.html
Books:
"The Visual Display of Quantitative Information": https://amzn.to/3A7bGtJ
#DataScience #BreakingIntoDataScience #StatisticsForDataScience
PayPal: richardondata@gmail.com
Patreon: https://www.patreon.com/richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32