Python & Data Fundamentals — Interview + Practical Guide

1. Python Core: Types, Control Flow, Collections

What interviewers expect

Know list/dict/set/tuple differences.
Understand mutability and references.
Write clean loops/comprehensions.
Use `dict.get`, `enumerate`, `zip` naturally.

🚨 Traps

Using a mutable default argument.
Confusing `is` vs `==`.
Modifying a list while iterating.
Assuming dict order in older Python versions.

            # ✅ dict/get + enumerate + list comprehension

            prices = {"EUR": 1.0, "USD": 1.08}

            currencies = ["EUR", "USD", "GBP"]

            result = [(i, c, prices.get(c, None)) for i, c in enumerate(currencies)]

            result

2. Functions & Idioms (Pythonic code)

            # ❌ Trap: mutable default argument

            def add_item(x, items=[]):

              items.append(x)

              return items

            # ✅ Fix

            def add_item_safe(x, items=None):

              if items is None:

                items = []

              items.append(x)

              return items

💡 “Senior” signals

Prefer pure functions for transformations.
Use generators for streaming (`yield`) when data is large.
Know `*args`, `**kwargs`, unpacking, and keyword-only args.

3. OOP & Dataclasses

            # ✅ dataclass is great for data containers

            from dataclasses import dataclass

            @dataclass(frozen=True)

            class Money:

              amount: float

              currency: str

⚠️ Trap: everything as a class

Python projects stay maintainable when you keep things simple: functions + small dataclasses + clear modules.

4. Exceptions & Error Handling

            # ✅ use specific exceptions + avoid broad except

            def parse_int(s: str) -> int | None:

              try:

                return int(s)

              except ValueError:

                return None

🚨 Trap: swallowing exceptions

`except Exception: pass` hides bugs. If you catch, do it intentionally: log, rethrow, or return an explicit error.

5. Environment: venv, pip, packaging basics

            # ✅ minimal workflow

            python -m venv .venv

            source .venv/bin/activate  # (Windows: .venv\Scripts\activate)

            pip install -U pip

            pip install pandas numpy pytest

What to say in interviews

“I isolate dependencies with venv, pin versions, and keep reproducible installs (requirements/lock files).”

6. NumPy Essentials

✅ Why NumPy

Fast vectorized operations (C-backed).
Memory-efficient arrays compared to Python lists.
Foundation for Pandas, SciPy, ML stacks.

            # ✅ vectorization

            import numpy as np

            x = np.array([1, 2, 3, 4])

            y = x * 10  # element-wise, fast

            y

🚨 Trap: Python loops on large arrays

Looping in Python is slow. Prefer vectorized operations, boolean masks, and broadcasting.

7. Pandas Essentials

Core operations

Select/filter: `.loc`, boolean masks
Group: `.groupby()` + aggregations
Join: `.merge()`
Missing values: `.isna()`, `.fillna()`

🚨 Traps

Chained indexing (can produce silent bugs).
Mixed dtypes (strings + numbers) messing up ops.
Huge DataFrames in memory without sampling.

            # ✅ groupby + merge (classic interview task)

            import pandas as pd

            orders = pd.DataFrame({"user_id":[1,1,2], "total":[10, 20, 5]})

            users  = pd.DataFrame({"user_id":[1,2], "name":["A", "B"]})

            agg = orders.groupby("user_id", as_index=False)["total"].sum().rename(columns={"total":"spent"})

            out = users.merge(agg, on="user_id", how="left").fillna({"spent":0})

            out

8. Data I/O: CSV, JSON, Parquet (and why it matters)

✅ Choose the right format

CSV: simple, universal, slower + no types.
JSON: nested, flexible, bigger files.
Parquet: columnar, fast, compressed (best for analytics).

⚠️ Trap

Reading huge CSVs without dtypes or chunks can explode memory. Prefer chunks or Parquet when possible.

            # ✅ read in chunks (memory friendly)

            import pandas as pd

            for chunk in pd.read_csv("big.csv", chunksize=100_000):

              chunk = chunk.dropna()

              # process chunk...

9. EDA Patterns (Exploratory Data Analysis)

Typical EDA checklist

Shape + dtypes + missing values
Basic stats: mean/median/std
Cardinality of categorical columns
Outliers (quantiles / IQR)

            # ✅ fast EDA snippets

            df.info()

            df.isna().mean().sort_values(ascending=False).head(10)

            df.describe(include="all").T

10. Performance Traps (Python reality)

⚠️ CPython & the GIL

Threads don’t speed up CPU-bound code in CPython. Use multiprocessing or vectorization.

🚨 Big trap in data code

Using `.apply()` row-by-row for heavy logic. Prefer vectorized ops or `groupby` transforms.

11. Testing with pytest (minimal but strong)

            # test_money.py

            def test_sum():

              assert (1 + 2) == 3

Interview sentence

“I keep pure functions and test them with pytest; I add integration tests for I/O boundaries.”

12. Typing & Code Quality

            # ✅ typing helps maintainability

            from typing import Optional

            def normalize_currency(code: str) -> str:

              return code.strip().upper()

🚨 Trap: “Python has no types”

Python is dynamically typed, but modern projects use type hints to reduce bugs and clarify contracts.

13. Quick Quiz (Self-check)

What is the difference between `is` and `==`? ▼

`==` checks value equality. `is` checks identity (same object in memory).

Why is vectorization faster than Python loops? ▼

Vectorized ops run in optimized C/NumPy loops, reducing Python interpreter overhead.

14. Most Common Interview Traps

Mutable default arguments

It keeps state across calls. Use `None` + create inside.

Chained indexing in Pandas

Use `.loc[...]` to avoid unexpected behavior.

`apply()` everywhere

Prefer vectorization or `groupby` transformations for speed.

🎯 Final Advice

If you can explain mutability, write clean pandas groupby/merge, and avoid the classic traps — you’ll already be above most candidates.