Python & Data Fundamentals
The practical essentials: Python core, NumPy/Pandas basics, data I/O, EDA patterns, and interview traps.
1. Python Core: Types, Control Flow, Collections
What interviewers expect
- Know list/dict/set/tuple differences.
- Understand mutability and references.
- Write clean loops/comprehensions.
- Use `dict.get`, `enumerate`, `zip` naturally.
π¨ Traps
- Using a mutable default argument.
- Confusing `is` vs `==`.
- Modifying a list while iterating.
- Assuming dict order in older Python versions.
prices = {"EUR": 1.0, "USD": 1.08}
currencies = ["EUR", "USD", "GBP"]
result = [(i, c, prices.get(c, None)) for i, c in enumerate(currencies)]
result
2. Functions & Idioms (Pythonic code)
def add_item(x, items=[]):
items.append(x)
return items
# β Fix
def add_item_safe(x, items=None):
if items is None:
items = []
items.append(x)
return items
π‘ βSeniorβ signals
- Prefer pure functions for transformations.
- Use generators for streaming (`yield`) when data is large.
- Know `*args`, `**kwargs`, unpacking, and keyword-only args.
3. OOP & Dataclasses
from dataclasses import dataclass
@dataclass(frozen=True)
class Money:
amount: float
currency: str
β οΈ Trap: everything as a class
Python projects stay maintainable when you keep things simple: functions + small dataclasses + clear modules.
4. Exceptions & Error Handling
def parse_int(s: str) -> int | None:
try:
return int(s)
except ValueError:
return None
π¨ Trap: swallowing exceptions
`except Exception: pass` hides bugs. If you catch, do it intentionally: log, rethrow, or return an explicit error.
5. Environment: venv, pip, packaging basics
python -m venv .venv
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -U pip
pip install pandas numpy pytest
What to say in interviews
βI isolate dependencies with venv, pin versions, and keep reproducible installs (requirements/lock files).β
6. NumPy Essentials
β Why NumPy
- Fast vectorized operations (C-backed).
- Memory-efficient arrays compared to Python lists.
- Foundation for Pandas, SciPy, ML stacks.
import numpy as np
x = np.array([1, 2, 3, 4])
y = x * 10 # element-wise, fast
y
π¨ Trap: Python loops on large arrays
Looping in Python is slow. Prefer vectorized operations, boolean masks, and broadcasting.
7. Pandas Essentials
Core operations
- Select/filter: `.loc`, boolean masks
- Group: `.groupby()` + aggregations
- Join: `.merge()`
- Missing values: `.isna()`, `.fillna()`
π¨ Traps
- Chained indexing (can produce silent bugs).
- Mixed dtypes (strings + numbers) messing up ops.
- Huge DataFrames in memory without sampling.
import pandas as pd
orders = pd.DataFrame({"user_id":[1,1,2], "total":[10, 20, 5]})
users = pd.DataFrame({"user_id":[1,2], "name":["A", "B"]})
agg = orders.groupby("user_id", as_index=False)["total"].sum().rename(columns={"total":"spent"})
out = users.merge(agg, on="user_id", how="left").fillna({"spent":0})
out
8. Data I/O: CSV, JSON, Parquet (and why it matters)
β Choose the right format
- CSV: simple, universal, slower + no types.
- JSON: nested, flexible, bigger files.
- Parquet: columnar, fast, compressed (best for analytics).
β οΈ Trap
Reading huge CSVs without dtypes or chunks can explode memory. Prefer chunks or Parquet when possible.
import pandas as pd
for chunk in pd.read_csv("big.csv", chunksize=100_000):
chunk = chunk.dropna()
# process chunk...
9. EDA Patterns (Exploratory Data Analysis)
Typical EDA checklist
- Shape + dtypes + missing values
- Basic stats: mean/median/std
- Cardinality of categorical columns
- Outliers (quantiles / IQR)
df.info()
df.isna().mean().sort_values(ascending=False).head(10)
df.describe(include="all").T
10. Performance Traps (Python reality)
β οΈ CPython & the GIL
Threads donβt speed up CPU-bound code in CPython. Use multiprocessing or vectorization.
π¨ Big trap in data code
Using `.apply()` row-by-row for heavy logic. Prefer vectorized ops or `groupby` transforms.
11. Testing with pytest (minimal but strong)
def test_sum():
assert (1 + 2) == 3
Interview sentence
βI keep pure functions and test them with pytest; I add integration tests for I/O boundaries.β
12. Typing & Code Quality
from typing import Optional
def normalize_currency(code: str) -> str:
return code.strip().upper()
π¨ Trap: βPython has no typesβ
Python is dynamically typed, but modern projects use type hints to reduce bugs and clarify contracts.
13. Quick Quiz (Self-check)
What is the difference between `is` and `==`? βΌ
`==` checks value equality. `is` checks identity (same object in memory).
Why is vectorization faster than Python loops? βΌ
Vectorized ops run in optimized C/NumPy loops, reducing Python interpreter overhead.
14. Most Common Interview Traps
Mutable default arguments
It keeps state across calls. Use `None` + create inside.
Chained indexing in Pandas
Use `.loc[...]` to avoid unexpected behavior.
`apply()` everywhere
Prefer vectorization or `groupby` transformations for speed.
π― Final Advice
If you can explain mutability, write clean pandas groupby/merge, and avoid the classic traps β youβll already be above most candidates.