Spaces:

AashishAIHub
/

DataScience

Running

File size: 5,718 Bytes

854c114

{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# ML Practice Series: Module 22 - SQL & Databases for Data Science\n",
                "\n",
                "In the real world, data lives in databases, not just CSVs. This module teaches you how to bridge the gap between **SQL (Structured Query Language)** and **Python/Pandas**.\n",
                "\n",
                "### Objectives:\n",
                "1. **Connecting to Databases**: Using `sqlite3` (built into Python).\n",
                "2. **Basic Queries**: SELECT, WHERE, and JOIN in Python.\n",
                "3. **SQL to Pandas**: Loading query results directly into a DataFrame.\n",
                "4. **Database Design**: Understanding primary keys and foreign keys.\n",
                "\n",
                "---"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 1. Setting up a Virtual Database\n",
                "We will create an in-memory database and populate it with some sample Data Science job data."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import sqlite3\n",
                "import pandas as pd\n",
                "\n",
                "# Create a connection to an in-memory database\n",
                "conn = sqlite3.connect(':memory:')\n",
                "cursor = conn.cursor()\n",
                "\n",
                "# Create a sample table\n",
                "cursor.execute('''\n",
                "    CREATE TABLE jobs (\n",
                "        id INTEGER PRIMARY KEY,\n",
                "        title TEXT,\n",
                "        company TEXT,\n",
                "        salary INTEGER\n",
                "    )\n",
                "''')\n",
                "\n",
                "# Insert sample records\n",
                "jobs = [\n",
                "    (1, 'Data Scientist', 'Google', 150000),\n",
                "    (2, 'ML Engineer', 'Tesla', 160000),\n",
                "    (3, 'Data Analyst', 'Netflix', 120000),\n",
                "    (4, 'AI Research', 'OpenAI', 200000)\n",
                "]\n",
                "cursor.executemany('INSERT INTO jobs VALUES (?,?,?,?)', jobs)\n",
                "conn.commit()\n",
                "\n",
                "print(\"Database created and table populated!\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 2. Basic SQL Queries in Python\n",
                "\n",
                "### Task 1: Fetching Data\n",
                "Use standard SQL to fetch all jobs where the salary is greater than 140,000."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "query = \"SELECT * FROM jobs WHERE salary > 140000\"\n",
                "cursor.execute(query)\n",
                "results = cursor.fetchall()\n",
                "for row in results:\n",
                "    print(row)\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## 3. SQL to Pandas: The Professional Way\n",
                "\n",
                "### Task 2: pd.read_sql_query\n",
                "Professionals use `pd.read_sql_query()` to pull data directly into a DataFrame. Try it now."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# YOUR CODE HERE\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "<details>\n",
                "<summary><b>Click to see Solution</b></summary>\n",
                "\n",
                "```python\n",
                "df_sql = pd.read_sql_query(\"SELECT * FROM jobs\", conn)\n",
                "print(df_sql.head())\n",
                "```\n",
                "</details>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "--- \n",
                "### Bridge Completed! \n",
                "You now know how to pull data from any standard relational database.\n",
                "Next: **Model Explainability (SHAP)**."
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.12.7"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}