{"id":248,"date":"2026-07-03T21:25:04","date_gmt":"2026-07-03T21:25:04","guid":{"rendered":"https:\/\/www.kindsonthegenius.com\/data-science\/?p=248"},"modified":"2026-07-03T21:27:47","modified_gmt":"2026-07-03T21:27:47","slug":"class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1","status":"publish","type":"post","link":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/","title":{"rendered":"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1"},"content":{"rendered":"<p><!-- ktg-updated-banner --><\/p>\n<div class=\"ktg-updated-banner\" style=\"margin:1em 0;padding:0.75em 1em;background:#eff6ff;border-left:4px solid #3b82f6;border-radius:4px;\">\n<p><strong>Updated July 3, 2026:<\/strong> Reviewed for accuracy and refreshed metadata.<\/p>\n<\/div>\n<p>This is Class three of our practical Science Course for Data Science Beginners. In this class would be perform data preprocessing and data cleaning (or data cleansing). We would also discuss some of the theoretical concepts.<\/p>\n<p>We would be using the Titanic Dataset. <a href=\"https:\/\/drive.google.com\/file\/d\/10Lsr1MvgORJl_XOv8LfrKXW9Hy-FcbOr\/view?usp=sharing\" target=\"_blank\" rel=\"noopener\">Get the Titanic Dataset here for free<\/a>.<\/p>\n<p>The following are covered:<\/p>\n<ol>\n<li><a href=\"#t1\">What is Data Preprocessing?<\/a><\/li>\n<li><a href=\"#t2\">Data Scaling<\/a><\/li>\n<li><a href=\"#t3\">Dropping and Interpolating Missing Data<\/a><\/li>\n<li><a href=\"#t4\">Data Normalisation<\/a><\/li>\n<li><a href=\"#t5\">Numerical and Categorical Values Conversion<\/a><\/li>\n<li><a href=\"#t5\">Data Binarization<\/a><\/li>\n<li><a href=\"#t5\">Data Standardization<\/a><\/li>\n<li><a href=\"#t5\">Data Labelling and Encoding<\/a><\/li>\n<li><a href=\"#t5\">Data Splitting &#8211; Feature and Class; Train &amp; Test<\/a><\/li>\n<\/ol>\n<p><a href=\"https:\/\/youtu.be\/ylhwP6wFEag\" target=\"_blank\" rel=\"noopener\">Class 3 Video on Preprocessing<\/a><\/p>\n<h4><strong id=\"t1\">1. What is Data Preprocessing?<\/strong><\/h4>\n<p>After obtaining your dataset and doing basic visualization, the next step is to perform preprocessing on your dataset. Data preprocessing refers to the operations you perform on your data to ensure it works well with Machine Learning algorithms. Data preprocessing also ensure better performance on analytics process. It includes data cleaning, outlier detection, data wrangling, normalization, data editing, unreliable data removal, data conversion etc.<\/p>\n<p>In this class we would perform most of them on the Titanic Dataset.<\/p>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t2\">2. Data Scaling or Rescaling<\/strong><\/h4>\n<p>Data scaling is a technique that ensures that the attributes of the dataset are on the same scale. Most times, we need to rescale to a scale of 0 to 1 as required by Machine Learning algorithms like k-Nearest Neighbor and Gradient Descent.<\/p>\n<p>Python provides a library called the MinMaxScalar for performing scaling. This library is available in sklearn module<\/p>\n<p>Take the four steps below to scale the data in the fare column of the Titanic dataset<\/p>\n<p><strong>Step 1<\/strong> &#8211; Create the MinMaxScaler object<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Create a MinMaxScaler object<\/span>\r\ndata_scaler <span style=\"color: #333333;\">=<\/span> pp<span style=\"color: #333333;\">.<\/span>MinMaxScaler(feature_range<span style=\"color: #333333;\">=<\/span>(<span style=\"color: #0000dd; font-weight: bold;\">0<\/span>,<span style=\"color: #0000dd; font-weight: bold;\">1<\/span>))\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Step 2<\/strong> &#8211; Extract the fare column<br \/>\n<!-- HTML generated using hilite.me --><\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Extract the fare column<\/span>\r\nfare_array <span style=\"color: #333333;\">=<\/span> titanic_df[[<span style=\"background-color: #fff0f0;\">'fare'<\/span>]]\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Step 3<\/strong> &#8211; Perform the scaling<br \/>\n<!-- HTML generated using hilite.me --><\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Perform the scaling of the extracted column<\/span>\r\nfare_array_scaled <span style=\"color: #333333;\">=<\/span> data_scaler<span style=\"color: #333333;\">.<\/span>fit_transform(fare_array)\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Step 4<\/strong> &#8211; Replace the original column<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Now replace the original column with the scaled column<\/span>\r\ntitanic_df[<span style=\"background-color: #fff0f0;\">'fare'<\/span>] <span style=\"color: #333333;\">=<\/span> fare_array_scaled\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t3\">3. Dropping and Interpolating Missing Data<\/strong><\/h4>\n<p>Dropping and interpolating are data cleansing technique used to handle missing values in a dataset. We can decided to drop a column if it does not contribute anything to the data analysis process. For example the name and the ticket columns.<\/p>\n<p><strong>Drop Columns with Missing Values<\/strong><\/p>\n<p>Another reason we may drop a column is when there are multiple missing values. An example is the body, boat and cabin columns of the Titanic dataset.<\/p>\n<p>To drop these columns, use the code below:<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Drop Columns<\/span>\r\ncols_to_drop <span style=\"color: #333333;\">=<\/span> [<span style=\"background-color: #fff0f0;\">'body'<\/span>, <span style=\"background-color: #fff0f0;\">'boat'<\/span>, <span style=\"background-color: #fff0f0;\">'name'<\/span>, <span style=\"background-color: #fff0f0;\">'ticket'<\/span>, <span style=\"background-color: #fff0f0;\">'cabin'<\/span>]\r\ntitanic_df <span style=\"color: #333333;\">=<\/span> titanic_df<span style=\"color: #333333;\">.<\/span>drop(cols_to_drop, axis<span style=\"color: #333333;\">=<\/span><span style=\"color: #0000dd; font-weight: bold;\">1<\/span>)\r\n<\/pre>\n<p>The axis = 1 indicates we are dropping columns<\/p>\n<p><strong>Interpolating Missing Values<\/strong><\/p>\n<p>If you have a column with very few missing values, you can just choose to interpolate them using existing values. Interpolation is simply a way to create new data based on existing data.\u00a0 For example if you have a range 2, 4, ?, 8, 10. Then here, by interpolation, the missing value will be 6 by interpolation. That is (4+8)\/2.<\/p>\n<p>Let&#8217;s interpolate the age column of the Titanic dataset using the code below<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># To replace missing values with interpolated values, for example Age<\/span>\r\ndf[<span style=\"background-color: #fff0f0;\">'Age'<\/span>] <span style=\"color: #333333;\">=<\/span> df[<span style=\"background-color: #fff0f0;\">'Age'<\/span>]<span style=\"color: #333333;\">.<\/span>interpolate()\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Drop rows with missing Values<\/strong><\/p>\n<p>To drop all rows with missing values, we can use the code below. Here, we don&#8217;t specify the axis.<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Drop all rows with missin data<\/span>\r\ndf <span style=\"color: #333333;\">=<\/span> df<span style=\"color: #333333;\">.<\/span>dropna()\r\n<\/pre>\n<p>&nbsp;<\/p>\n<h4><strong id=\"t4\">4. Data Normalisation<\/strong><\/h4>\n<p>Normalization is used when certain features have broad range of values. For example some feature have values of 0 or close to zero while some other feature have very high values of say, in 100s or 1000s.\u00a0 In this case normalization would scale each recored to have a range length of say, 1.<\/p>\n<p>There are two types of normalization: L1 Normalization and L2 Normalization<\/p>\n<p><strong>L1 Normalization<\/strong> &#8211; Also known as Manhattan normalization. Here, for each row of the dataset, the sum of the absolution values will always equal 1<\/p>\n<p><strong>L2 Normalization<\/strong> &#8211; Also known as Euclidean normalization. Here, for each row of data, the root of the sum of the square of the values will always equal 1.<\/p>\n<p>To perform normalization we simply create a normalizer object and proceed similar to how we performed scaling. Code snippet is given below. See video for full explanation<\/p>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #888888;\"># Perform Normaliztion on the parch column<\/span>\r\nnormalizer <span style=\"color: #333333;\">=<\/span> pp<span style=\"color: #333333;\">.<\/span>Normalizer(norm<span style=\"color: #333333;\">=<\/span><span style=\"background-color: #fff0f0;\">'l1'<\/span>) <span style=\"color: #888888;\"># use l2 for L2 Normalization<\/span>\r\nparch_array <span style=\"color: #333333;\">=<\/span> titanic_df[[<span style=\"background-color: #fff0f0;\">'parch'<\/span>]]\r\nparch_array_normalized <span style=\"color: #333333;\">=<\/span> normalizer<span style=\"color: #333333;\">.<\/span>transform(parch_array)\r\ntitanic_df[<span style=\"background-color: #fff0f0;\">'parch'<\/span>] <span style=\"color: #333333;\">=<\/span> parch_array_normalized\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><strong>Exercise<\/strong>: Perform L2 normalization on the Ash column of the wine dataset. (try it, then see video for procedure and explanation)<\/p>\n<p><strong id=\"t5\"><a href=\"#\">The remaining 5 points are covered in the next Class Part 2<\/a><\/strong><\/p>\n<ul>\n<li>5. Numerical and Categorical Values Conversion<\/li>\n<li>6. Data Binarization<\/li>\n<li>7. Data Standardization<\/li>\n<li>8. Data Labelling and Encoding<\/li>\n<li>9. Data Splitting &#8211; Feature and Class; Train &amp; Test<\/li>\n<\/ul>\n<p><a href=\"https:\/\/kindsonthegenius.com\/data-science\/class-4-introduction-to-data-preprocessing-and-data-cleaning-part-2\/\" target=\"_blank\" rel=\"noopener\">Go to Part 2<\/a><\/p>\n<p>&nbsp;<\/p>\n<p><!-- ktg-series-nav --><\/p>\n<nav class=\"ktg-series-nav\" aria-label=\"Data science class navigation\">\n<p><strong>Previous:<\/strong> <a href=\"https:\/\/kindsonthegenius.com\/data-science\/practical-data-science-class-for-data-science-beginners\/\">Class 1 &#8211; Practical Data Science Class For Data Science Beginners<\/a><\/p>\n<p><strong>Next:<\/strong> <a href=\"https:\/\/kindsonthegenius.com\/data-science\/class-4-introduction-to-data-preprocessing-and-data-cleaning-part-2\/\">Class 4 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 2<\/a><\/p>\n<\/nav>\n<p><!-- ktg-python-link --><\/p>\n<div class=\"ktg-python-link\" style=\"margin:1.5em 0;padding:1em;border:1px solid #e2e8f0;border-radius:6px;\">\n<p><strong>New to Python?<\/strong> Start with our <a href=\"https:\/\/www.kindsonthegenius.com\/python\/\">free Python tutorials<\/a> or the <a href=\"https:\/\/www.kindsonthegenius.com\/python\/python-in-10-days-simplified-for-non-programmers-a-preparation-for-data-science\/\">Python in 10 Days<\/a> data-science prep course.<\/p>\n<\/div>\n<p><!-- ktg-alkademy-cta --><\/p>\n<div class=\"ktg-alkademy-cta\" style=\"margin:2em 0;padding:1.25em;border-left:4px solid #2563eb;background:#f8fafc;\">\n<p><strong>Want live data science classes?<\/strong> Join <a href=\"https:\/\/www.alkademy.com\/courses\" target=\"_blank\" rel=\"noopener noreferrer\">Alkademy<\/a> for instructor-led data science and Python courses with hands-on projects.<\/p>\n<\/div>\n<p><!-- ktg-faq-schema --><br \/>\n<script type=\"application\/ld+json\">{\"@context\":\"https:\/\/schema.org\",\"@type\":\"FAQPage\",\"mainEntity\":[{\"@type\":\"Question\",\"name\":\"Why is data preprocessing important?\",\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Clean, consistent data improves model accuracy and prevents garbage-in-garbage-out failures.\"}},{\"@type\":\"Question\",\"name\":\"What are common data cleaning steps?\",\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Handle missing values, fix dtypes, remove duplicates, and standardize formats.\"}},{\"@type\":\"Question\",\"name\":\"Which Python libraries help with preprocessing?\",\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"Pandas, NumPy, and scikit-learn provide the core preprocessing toolkit.\"}}]}<\/script><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>Updated July 3, 2026: Reviewed for accuracy and refreshed metadata. This is Class three of our practical Science Course for Data Science Beginners. In this &hellip; <!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":2,"featured_media":251,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[44,43],"tags":[54,55,53,56,57],"class_list":["post-248","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-python","tag-data-cleaning","tag-data-cleansing","tag-normalization","tag-preprocessing","tag-scaling"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1 | Data Science Tutorials<\/title>\n<meta name=\"description\" content=\"Data preprocessing and cleaning in Python \u2014 Part 1. Handle missing data, dtypes, and quality checks for ML pipelines. Free tutorial with runnable Python code.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1 | Data Science Tutorials\" \/>\n<meta property=\"og:description\" content=\"Data preprocessing and cleaning in Python \u2014 Part 1. Handle missing data, dtypes, and quality checks for ML pipelines. Free tutorial with runnable Python code.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/\" \/>\n<meta property=\"og:site_name\" content=\"Data Science Tutorials\" \/>\n<meta property=\"article:published_time\" content=\"2026-07-03T21:25:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-07-03T21:27:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1805\" \/>\n\t<meta property=\"og:image:height\" content=\"802\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"kindsonthegenius\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"kindsonthegenius\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/\"},\"author\":{\"name\":\"kindsonthegenius\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/#\\\/schema\\\/person\\\/31dd138b160587ab3ea3c4746c59bfbc\"},\"headline\":\"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1\",\"datePublished\":\"2026-07-03T21:25:04+00:00\",\"dateModified\":\"2026-07-03T21:27:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/\"},\"wordCount\":745,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Data-Science-Class-on-Preprocessing.jpg\",\"keywords\":[\"Data Cleaning\",\"Data Cleansing\",\"Normalization\",\"Preprocessing\",\"Scaling\"],\"articleSection\":[\"Data Science\",\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/\",\"url\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/\",\"name\":\"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1 | Data Science Tutorials\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Data-Science-Class-on-Preprocessing.jpg\",\"datePublished\":\"2026-07-03T21:25:04+00:00\",\"dateModified\":\"2026-07-03T21:27:47+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/#\\\/schema\\\/person\\\/31dd138b160587ab3ea3c4746c59bfbc\"},\"description\":\"Data preprocessing and cleaning in Python \u2014 Part 1. Handle missing data, dtypes, and quality checks for ML pipelines. Free tutorial with runnable Python code.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#primaryimage\",\"url\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Data-Science-Class-on-Preprocessing.jpg\",\"contentUrl\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/wp-content\\\/uploads\\\/2021\\\/09\\\/Data-Science-Class-on-Preprocessing.jpg\",\"width\":1805,\"height\":802,\"caption\":\"Data Science Class on Data Preprocessing\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/#website\",\"url\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/\",\"name\":\"Data Science Tutorials\",\"description\":\"Data Science and Machine Learning in Python and R\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/#\\\/schema\\\/person\\\/31dd138b160587ab3ea3c4746c59bfbc\",\"name\":\"kindsonthegenius\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d710de456c3d85e5614c3a6992fa3d527425e2ab32b8bd5d85bfbaa235004b?s=96&d=wavatar&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d710de456c3d85e5614c3a6992fa3d527425e2ab32b8bd5d85bfbaa235004b?s=96&d=wavatar&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d710de456c3d85e5614c3a6992fa3d527425e2ab32b8bd5d85bfbaa235004b?s=96&d=wavatar&r=g\",\"caption\":\"kindsonthegenius\"},\"url\":\"https:\\\/\\\/kindsonthegenius.com\\\/data-science\\\/author\\\/kindsonthegenius-3\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1 | Data Science Tutorials","description":"Data preprocessing and cleaning in Python \u2014 Part 1. Handle missing data, dtypes, and quality checks for ML pipelines. Free tutorial with runnable Python code.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/","og_locale":"en_US","og_type":"article","og_title":"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1 | Data Science Tutorials","og_description":"Data preprocessing and cleaning in Python \u2014 Part 1. Handle missing data, dtypes, and quality checks for ML pipelines. Free tutorial with runnable Python code.","og_url":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/","og_site_name":"Data Science Tutorials","article_published_time":"2026-07-03T21:25:04+00:00","article_modified_time":"2026-07-03T21:27:47+00:00","og_image":[{"width":1805,"height":802,"url":"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg","type":"image\/jpeg"}],"author":"kindsonthegenius","twitter_card":"summary_large_image","twitter_misc":{"Written by":"kindsonthegenius","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#article","isPartOf":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/"},"author":{"name":"kindsonthegenius","@id":"https:\/\/kindsonthegenius.com\/data-science\/#\/schema\/person\/31dd138b160587ab3ea3c4746c59bfbc"},"headline":"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1","datePublished":"2026-07-03T21:25:04+00:00","dateModified":"2026-07-03T21:27:47+00:00","mainEntityOfPage":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/"},"wordCount":745,"commentCount":0,"image":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#primaryimage"},"thumbnailUrl":"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg","keywords":["Data Cleaning","Data Cleansing","Normalization","Preprocessing","Scaling"],"articleSection":["Data Science","Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/","url":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/","name":"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1 | Data Science Tutorials","isPartOf":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/#website"},"primaryImageOfPage":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#primaryimage"},"image":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#primaryimage"},"thumbnailUrl":"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg","datePublished":"2026-07-03T21:25:04+00:00","dateModified":"2026-07-03T21:27:47+00:00","author":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/#\/schema\/person\/31dd138b160587ab3ea3c4746c59bfbc"},"description":"Data preprocessing and cleaning in Python \u2014 Part 1. Handle missing data, dtypes, and quality checks for ML pipelines. Free tutorial with runnable Python code.","breadcrumb":{"@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#primaryimage","url":"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg","contentUrl":"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg","width":1805,"height":802,"caption":"Data Science Class on Data Preprocessing"},{"@type":"BreadcrumbList","@id":"https:\/\/kindsonthegenius.com\/data-science\/class-3-introduction-to-data-preprocessing-and-data-cleaning-part-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/kindsonthegenius.com\/data-science\/"},{"@type":"ListItem","position":2,"name":"Class 3 &#8211; Introduction to Data Preprocessing and Data Cleaning &#8211; Part 1"}]},{"@type":"WebSite","@id":"https:\/\/kindsonthegenius.com\/data-science\/#website","url":"https:\/\/kindsonthegenius.com\/data-science\/","name":"Data Science Tutorials","description":"Data Science and Machine Learning in Python and R","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/kindsonthegenius.com\/data-science\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/kindsonthegenius.com\/data-science\/#\/schema\/person\/31dd138b160587ab3ea3c4746c59bfbc","name":"kindsonthegenius","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b9d710de456c3d85e5614c3a6992fa3d527425e2ab32b8bd5d85bfbaa235004b?s=96&d=wavatar&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b9d710de456c3d85e5614c3a6992fa3d527425e2ab32b8bd5d85bfbaa235004b?s=96&d=wavatar&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b9d710de456c3d85e5614c3a6992fa3d527425e2ab32b8bd5d85bfbaa235004b?s=96&d=wavatar&r=g","caption":"kindsonthegenius"},"url":"https:\/\/kindsonthegenius.com\/data-science\/author\/kindsonthegenius-3\/"}]}},"jetpack_featured_media_url":"https:\/\/kindsonthegenius.com\/data-science\/wp-content\/uploads\/2021\/09\/Data-Science-Class-on-Preprocessing.jpg","_links":{"self":[{"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/posts\/248","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/comments?post=248"}],"version-history":[{"count":8,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/posts\/248\/revisions"}],"predecessor-version":[{"id":338,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/posts\/248\/revisions\/338"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/media\/251"}],"wp:attachment":[{"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/media?parent=248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/categories?post=248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kindsonthegenius.com\/data-science\/wp-json\/wp\/v2\/tags?post=248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}