Week | Day | Topics |
---|---|---|
Week 1 | Day 1-2 | Introduction to Data Science: Prelude, The Problem Landscape, Defining Data Science, Demystifying Data Science, Overview of Data Scientist’s Toolbox |
Day 3-4 | Python Basics: Installation, Setup, Data Types, Functions, Data Manipulation & Visualization with Matplotlib and Seaborn | |
Day 5-6 | Probability and Statistics: Descriptive Statistics, Probability Distributions (Gaussian, Bernoulli, Binomial) | |
Day 7-8 | Holiday | |
Week 2 | Day 9-10 | Probability & Statistics: Dispersion (Variance, Standard Deviation), Hypothesis Testing (Z-test, t-test, Chi-square), Correlation Analysis (Pearson, Spearman) |
Day 11-12 | Introduction to Numpy: Array Operations, Random Data Generation | |
Day 13-14 | Pandas: Importing Datasets, Data Wrangling, Cleaning, EDA | |
Day 15-16 | Holiday | |
Day 14 | QA Session | |
Week 3 | Day 17-18 | SQL for Data Science: Basic Queries, Joins, Subqueries, Aggregation |
Day 19-20 | SQL: Group By, HAVING, Aggregation Functions, Working with SQL Databases | |
Day 21-22 | Scipy & Seaborn: Advanced Visualization, EDA | |
Day 23-24 | Holiday | |
Week 4 | Day 25-26 | Principles of Information Visualization, Applied Visualizations (Matplotlib, Seaborn) |
Day 27-28 | Tableau Basics: Loading Data, Creating Charts, Basic Visual Analysis | |
Day 29-30 | EDA & Hypothesis Testing: Feature Engineering, Statistical Inference | |
Day 31-32 | Holiday | |
Day 30 | QA Session | |
Week 5 | Day 33-34 | Machine Learning Introduction: Supervised vs Unsupervised Learning, Overview of Clustering, Classification, and Regression |
Day 35-36 | Linear Regression: Best Fit Line, Model Training, Model Evaluation | |
Day 37-38 | Logistic Regression: Sigmoid Curve, Model Evaluation | |
Day 39-40 | Holiday | |
Week 6 | Day 41-42 | Support Vector Machine (SVM): Kernel Trick, Hyperplanes, Model Evaluation |
Day 43-44 | K-Nearest Neighbors (KNN): KNN Algorithm, Distance Metrics, Model Evaluation | |
Day 45-46 | AutoML for Model Building: Introduction, Tools, Automated Model Optimization | |
Day 47-48 | Holiday | |
Day 42 | QA Session | |
Week 7 | Day 49-50 | Unsupervised Learning: Clustering, K-Means Algorithm, Model Evaluation |
Day 51-52 | Principal Component Analysis (PCA): Dimensionality Reduction, Eigenvectors, Eigenvalues | |
Day 53-54 | Text Mining with NLTK: Text Preprocessing, Regex, Text Classification | |
Day 55-56 | Holiday | |
Week 8 | Day 57-58 | Machine Learning Web App Development with Streamlit: Building Interactive ML Apps, Deploying Models |
Day 59-60 | FastAPI for ML Deployment: API Building, Asynchronous Processing, Deployment & Scaling |
Assignment Plan
Week | Assignment | Description | Due Date | Skills Covered |
---|---|---|---|---|
Week 1 | Assignment 1: Data Science Fundamentals | - Write a report on the role of a Data Scientist, including a comparison between Data Science, AI, ML, and DL. - Complete a basic Python exercise: data types, loops, functions, and basic file handling. | Day 6 | - Data Science Concepts - Python Basics (Variables, Functions, Data Types) |
Week 2 | Assignment 2: Probability & Statistics | - Implement basic statistical tests (z-test, t-test, Chi-square) using Python. - Conduct a descriptive analysis (mean, median, standard deviation) on a dataset. | Day 14 | - Hypothesis Testing - Descriptive Statistics - Data Wrangling with Python |
Week 3 | Assignment 3: SQL Basics | - Create a set of SQL queries to filter, sort, and join data from multiple tables. - Perform aggregation on a dataset using SQL. | Day 22 | - SQL Queries - Data Aggregation and Joins |
Week 4 | Assignment 4: Data Visualization & Tableau | - Create a report on information visualization principles using Matplotlib/Seaborn. - Create a dashboard in Tableau using real-world data (from Excel). | Day 30 | - Data Visualization (Python, Tableau) - Data Analysis & Reporting |
Week 5 | Assignment 5: Regression Analysis | - Implement Linear Regression in Python to predict an outcome based on features. - Evaluate the model's performance (e.g., R-squared, Mean Absolute Error). | Day 38 | - Linear Regression - Model Evaluation Techniques |
Week 6 | Assignment 6: Classification Models | - Implement Logistic Regression and KNN (K-Nearest Neighbors) for a classification problem. - Compare models’ performance (e.g., accuracy, precision, recall). | Day 46 | - Logistic Regression - KNN Algorithm - Classification Metrics |
Week 7 | Assignment 7: Unsupervised Learning | - Implement K-Means clustering on a dataset. - Use PCA (Principal Component Analysis) for dimensionality reduction and explain the results. | Day 54 | - K-Means Clustering - PCA (Dimensionality Reduction) |
Week 8 | Assignment 8: Text Mining & Web App | - Implement a text classification task using NLTK or Regex. - Build a basic ML web app using Streamlit, where users can input data for model predictions. | Day 60 | - Text Mining (NLTK, Regex) - ML Web App Development (Streamlit) |
Assignments :
Week 1 Assignment - Data Science Fundamentals
-
Task: Report on Data Science vs. AI, ML, DL.
-
Write a comparison paper highlighting key differences.
-
-
Python Exercise: A simple program to manipulate data and implement basic Python concepts.
-
Example: Build a small Python script that reads a file, processes some data, and outputs a summary of statistics.
-
Week 2 Assignment - Probability & Statistics
-
Task: Implement hypothesis testing using Python:
-
Use libraries like
SciPy
to implement z-tests, t-tests, and Chi-square tests.
-
-
Descriptive Statistics: Work with datasets to calculate mean, median, mode, and standard deviation.
-
Example: Download a dataset and analyze central tendency, dispersion, and test hypotheses.
-
Week 3 Assignment - SQL Basics
-
Task: Write SQL queries to perform different tasks:
-
Task 1: Filter, sort, and join datasets from multiple tables.
-
Task 2: Use SQL aggregation functions like SUM, AVG, COUNT to analyze data.
-
Week 4 Assignment - Data Visualization & Tableau
-
Task 1: Write a report on data visualization principles (chart types, color usage, etc.).
-
Task 2: Create a Tableau dashboard that displays key insights from a dataset (e.g., sales data, customer data, etc.).
Week 5 Assignment - Regression Analysis
-
Task: Implement Linear Regression:
-
Train and evaluate the model using Python (using
scikit-learn
). -
Task: Evaluate the model using metrics like R-squared and Mean Absolute Error.
-
Week 6 Assignment - Classification Models
-
Task 1: Build a Logistic Regression model for binary classification (e.g., predict if a customer will buy a product based on features).
-
Task 2: Build a K-Nearest Neighbors (KNN) model and evaluate its performance.
Week 7 Assignment - Unsupervised Learning
-
Task 1: Implement K-Means clustering on a dataset to identify clusters.
-
Example: Use a customer dataset to segment customers into groups.
-
-
Task 2: Use PCA to reduce the dimensionality of the dataset and visualize the results.
Week 8 Assignment - Text Mining & Web App
-
Task 1: Implement text mining techniques using NLTK or Regex:
-
Perform text cleaning, tokenization, and classification tasks.
-
-
Task 2: Build a Streamlit web app that takes user input (like text or numbers), runs the model, and displays predictions.
Key Considerations for Assignments:
-
Submission Format: Most assignments will be submitted as reports (PDF or Jupyter Notebooks) or code (Python scripts).
-
Evaluation Criteria:
-
Correctness and efficiency of the code.
-
Clarity of explanations and reports.
-
Data analysis skills (EDA, feature selection, visualization).
-
Proper evaluation of machine learning models (accuracy, precision, recall, etc.).
-
-
Extensions: Some assignments will have bonus tasks for more advanced challenges, such as working with larger datasets or using more advanced techniques like hyperparameter tuning.