PY MASTER
Index

Python & Data Fundamentals

The practical essentials: Python core, NumPy/Pandas basics, data I/O, EDA patterns, and interview traps.

Python basics NumPy Pandas data cleaning typing pytest

1. Python Core: Types, Control Flow, Collections

What interviewers expect

  • Know list/dict/set/tuple differences.
  • Understand mutability and references.
  • Write clean loops/comprehensions.
  • Use `dict.get`, `enumerate`, `zip` naturally.

🚨 Traps

  • Using a mutable default argument.
  • Confusing `is` vs `==`.
  • Modifying a list while iterating.
  • Assuming dict order in older Python versions.
# βœ… dict/get + enumerate + list comprehension
prices = {"EUR": 1.0, "USD": 1.08}
currencies = ["EUR", "USD", "GBP"]
result = [(i, c, prices.get(c, None)) for i, c in enumerate(currencies)]
result

2. Functions & Idioms (Pythonic code)

# ❌ Trap: mutable default argument
def add_item(x, items=[]):
  items.append(x)
  return items

# βœ… Fix
def add_item_safe(x, items=None):
  if items is None:
    items = []
  items.append(x)
  return items

πŸ’‘ β€œSenior” signals

  • Prefer pure functions for transformations.
  • Use generators for streaming (`yield`) when data is large.
  • Know `*args`, `**kwargs`, unpacking, and keyword-only args.

3. OOP & Dataclasses

# βœ… dataclass is great for data containers
from dataclasses import dataclass

@dataclass(frozen=True)
class Money:
  amount: float
  currency: str

⚠️ Trap: everything as a class

Python projects stay maintainable when you keep things simple: functions + small dataclasses + clear modules.

4. Exceptions & Error Handling

# βœ… use specific exceptions + avoid broad except
def parse_int(s: str) -> int | None:
  try:
    return int(s)
  except ValueError:
    return None

🚨 Trap: swallowing exceptions

`except Exception: pass` hides bugs. If you catch, do it intentionally: log, rethrow, or return an explicit error.

5. Environment: venv, pip, packaging basics

# βœ… minimal workflow
python -m venv .venv
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -U pip
pip install pandas numpy pytest

What to say in interviews

β€œI isolate dependencies with venv, pin versions, and keep reproducible installs (requirements/lock files).”

6. NumPy Essentials

βœ… Why NumPy

  • Fast vectorized operations (C-backed).
  • Memory-efficient arrays compared to Python lists.
  • Foundation for Pandas, SciPy, ML stacks.
# βœ… vectorization
import numpy as np
x = np.array([1, 2, 3, 4])
y = x * 10 # element-wise, fast
y

🚨 Trap: Python loops on large arrays

Looping in Python is slow. Prefer vectorized operations, boolean masks, and broadcasting.

7. Pandas Essentials

Core operations

  • Select/filter: `.loc`, boolean masks
  • Group: `.groupby()` + aggregations
  • Join: `.merge()`
  • Missing values: `.isna()`, `.fillna()`

🚨 Traps

  • Chained indexing (can produce silent bugs).
  • Mixed dtypes (strings + numbers) messing up ops.
  • Huge DataFrames in memory without sampling.
# βœ… groupby + merge (classic interview task)
import pandas as pd
orders = pd.DataFrame({"user_id":[1,1,2], "total":[10, 20, 5]})
users = pd.DataFrame({"user_id":[1,2], "name":["A", "B"]})

agg = orders.groupby("user_id", as_index=False)["total"].sum().rename(columns={"total":"spent"})
out = users.merge(agg, on="user_id", how="left").fillna({"spent":0})
out

8. Data I/O: CSV, JSON, Parquet (and why it matters)

βœ… Choose the right format

  • CSV: simple, universal, slower + no types.
  • JSON: nested, flexible, bigger files.
  • Parquet: columnar, fast, compressed (best for analytics).

⚠️ Trap

Reading huge CSVs without dtypes or chunks can explode memory. Prefer chunks or Parquet when possible.

# βœ… read in chunks (memory friendly)
import pandas as pd
for chunk in pd.read_csv("big.csv", chunksize=100_000):
  chunk = chunk.dropna()
  # process chunk...

9. EDA Patterns (Exploratory Data Analysis)

Typical EDA checklist

  • Shape + dtypes + missing values
  • Basic stats: mean/median/std
  • Cardinality of categorical columns
  • Outliers (quantiles / IQR)
# βœ… fast EDA snippets
df.info()
df.isna().mean().sort_values(ascending=False).head(10)
df.describe(include="all").T

10. Performance Traps (Python reality)

⚠️ CPython & the GIL

Threads don’t speed up CPU-bound code in CPython. Use multiprocessing or vectorization.

🚨 Big trap in data code

Using `.apply()` row-by-row for heavy logic. Prefer vectorized ops or `groupby` transforms.

11. Testing with pytest (minimal but strong)

# test_money.py
def test_sum():
  assert (1 + 2) == 3

Interview sentence

β€œI keep pure functions and test them with pytest; I add integration tests for I/O boundaries.”

12. Typing & Code Quality

# βœ… typing helps maintainability
from typing import Optional

def normalize_currency(code: str) -> str:
  return code.strip().upper()

🚨 Trap: β€œPython has no types”

Python is dynamically typed, but modern projects use type hints to reduce bugs and clarify contracts.

13. Quick Quiz (Self-check)

What is the difference between `is` and `==`? β–Ό

`==` checks value equality. `is` checks identity (same object in memory).

Why is vectorization faster than Python loops? β–Ό

Vectorized ops run in optimized C/NumPy loops, reducing Python interpreter overhead.

14. Most Common Interview Traps

Mutable default arguments

It keeps state across calls. Use `None` + create inside.

Chained indexing in Pandas

Use `.loc[...]` to avoid unexpected behavior.

`apply()` everywhere

Prefer vectorization or `groupby` transformations for speed.

🎯 Final Advice

If you can explain mutability, write clean pandas groupby/merge, and avoid the classic traps β€” you’ll already be above most candidates.