Bootcamp Prep Course
This course takes you one step closer to becoming a data scientist by offering a subset of the topics covered in our Data Science Bootcamp. You’ll get a well-rounded intro to the core concepts and technologies taught within the bootcamp, including basic machine learning principles and hands-on coding experience. Plus, you’ll put it all to practice through a mini data science project of your own. We’ll cover the following:
- Data acquisition, cleaning, and aggregation
- Exploratory data analysis and visualization
- Feature engineering
- Model creation and validation
- Basic statistical and mathematical foundations for data science
Fill Out The Form For FREE Classes And To View Codeva Program Packages
VIEW FULL SYLLABUS
The Intro to Data Science instructor’s enthusiasm and ability to explain complex topics made this a great introduction to the fundamentals of data science and Python programming. This course helped prep me for the Metis data science bootcamp, and I’d highly recommend it to anyone looking to gain a better understanding of concepts taught throughout the bootcamp.
Who the course is designed for:
You have a strong desire to learn data science through top-quality instruction, a basic understanding of data analysis techniques and an interest in improving their ability to tackle data-rich problems in a systematic, principled way. This course provides structure and accountability to ensure you stay on track, finish strong, and achieve your desired outcomes.
- An understanding of problems solvable with data science and an ability to attack them from a statistical perspective.
- An understanding of when to use supervised and unsupervised statistical learning methods on labeled and unlabeled data-rich problems.
- The ability to create data analytical pipelines and applications in Python.
- Familiarity with the Python data science ecosystem and the various tools needed to continue developing as a data scientist.
Course Structure & Syllabus
- We start with the basics. For CS, we briefly cover basic data structures/types, program control flow, and syntax in Python. For statistics, we go over basic probability and probability distributions, along with general properties of some common distributions. For linear algebra, we cover matrices, vectors, and some of their properties and how to use them in Python.
- We spend a considerable amount of time using the Pandas Python package to attack a dataset we’ve never seen before, uncovering some useful information from it. At this point, students decide on a course project that would benefit from the data-scientific approach. The project must involve public (freely-accessible and usable) data and must answer an interesting question, or collection of questions, about that data. (Several resources of free data will be provided.)
- We learn about the two basic kinds of statistical models, which have classically been used for prediction (supervised learning): Linear Regression and Logistic Regression. We also look at clustering using K-Means, one of the ways you can glean information from unlabeled data.
- We switch gears from talking about algorithms to talk about features. What are they? How do we engineer them? And what can be done (Principal Component Analysis/Independent Component Analysis, regularization) to create and use them given the data at hand? We also cover how to construct complete data pipelines, going from data ingestion and preprocessing to model construction and evaluation.
- We delve into more advanced supervised learning approaches and get a feel for linear support vector machines, decision trees, and random forest models for regression and classification. We also explore DBSCAN, an additional unsupervised learning approach.
- We explore more sophisticated model evaluation approaches (cross-validation and bootstrapping) with the goal of understanding how we can make our models as generalizable as possible. Students complete data science projects and share learnings and discoveries.