The `fixest` R package provides various options for small sample corrections. While it has an excellent [vignette](https://cran.r-project.org/web/packages/fixest/vignettes/standard_errors.html) on the topic, reproducing its behavior in `pyfixest` took more time than expected. So that future developers (and my future self) can stay sane, I’ve compiled all of my hard-earned understanding of how small sample adjustments work in `fixest` and how they are implemented in `pyfixest` in this document.

In both `fixest` and `pyfixest`, small sample corrections are controlled by the `ssc` function. In `pyfixest`, `ssc` accepts four arguments: `k_adj`, `G_adj`, `k_fixef` and `G_df`.

Based on these inputs, the adjusted variance-covariance matrix is computed as:

```
vcov_k_adj = adj_val(N, df_k) if k_adj else 1
          * G_adj_val(G, G_df) if G_adj else 1
          * vcov
```

Where:

- **`k_adj`**: Enables or disables the first scalar adjustment.
- **`G_adj`**: Enables or disables the second scalar adjustment. This argument is only relevant / applied to clustered errors.
- **`vcov`**: The unadjusted variance-covariance matrix.
- **`df_k`**: The number of estimated parameters considered in the first adjustment. Impacts `adj_val`.
- **`k_fixef`**: Determines how `df_k` is computed (how fixed effects are counted).
- **`G_df`**: Determines how `G_adj_val` is computed (only relevant for multi-way clustering).
- **`G`**: The number of unique clusters (`G = N` for heteroskedastic errors).

Outside of this formula, we have **`df_t`**, which is the degrees of freedom used for p-values and confidence intervals:

- `df_t = N - df_k` for IID or heteroskedastic errors.
- `df_t = G - 1` for clustered errors.

---

# Small Sample Adjustments

## `k_adj = True`

If `k_adj = True`, the adjustment factor is:

`adj_val = (N - 1) / (N - df_k)`

If `k_adj = False`, no adjustment is applied.

Note that for `k_adj = True` and heteroskedastic errors, the applied correction is `N / (N-k)`.

---

## `k_fixef`

The `k_fixef` argument controls how fixed effects contribute to `df_k`, and thus to `adj_val`. It supports three options:

- **`"none"`**
- **`"full"`**
- **`"nonnested"`**

### `k_fixef = "none"`

Fixed effects are ignored when counting parameters:

- **Example**:
  - `Y ~ X1 | f1` → `k = 1`
  - `Y ~ X1 + X2 | f1` → `k = 2`

### `k_fixef = "full"`

Fixed effects are fully counted. For `n_fe` total fixed effects and each fixed effect `f_i`, we set `df_k = k + k_fe`,


- If there is **more than one** fixed effect, we drop one level from each fixed effects except the first (to avoid multicollinearity)
  `k_fe = sum_{i=1}^{n_fe} levels(f_i) - (n_fe - 1)`

- If there is **only one** fixed effect:
  `k_fe = sum_{i=1}^{n_fe} levels(f_i) = levels(f_1)`

### `k_fixef = "nonnested"`

Fixed effects may be **nested** within cluster variables (e.g., district FEs nested in state clusters). If `k_fixef = "nonnested"`, nested fixed effects do not count toward `k_fe`:

`k_fe = sum_{i=1}^{n_fe} levels(f_i) - k_fe_nested - (n_fe - 1)`

where `k_fe_nested` is the count of nested fixed effects. For cluster fixed effects, `k_fe_nested = G`, the number of clusters.

> ⚠️ *Note:* If you already subtracted a level from a nested FE, you may need to add it back.

---

## `G_adj`

This argument is only relevant for clustered errors.

If `G_adj = True`, we apply a second correction:

`G_df_val = G / (G - 1)`

Where:

- `G` is the number of clusters for clustered errors, or `N` for heteroskedastic errors.
- This follows the approach in R’s `sandwich` package, interpreting heteroskedastic errors as “singleton clusters.”

> *Tip:* If `G_adj = True` for IID errors, `G_df_val` defaults to `1`. For *heteroskedastic erros*, despite its name, `G_adj=True` will apply an adjustment of (N-1) / N, as there are $G = N$ singleton clusters.

---

## `G_df`

Relevant only for **multi-way clustering**. Two-way clustering, for example, can be written as:

`vcov = ssc_A * vcov_A + ssc_B * vcov_B - ssc_AB * vcov_AB`

where `A` and `B` are clustering dimensions, with `G_AB > G_A > G_B`.

- If `G_df = "min"`, then G is set to the minimum value of `G_A`, `G_B`, and `G_AB`.
- If `G_df = "conventional"`, each clustering dimension uses its own cluster count (`G_A`, `G_B`, etc.) for its respective adjustment.

---

# More on Inference

For computing critical values:

- **OLS and IV**: use t-statistics with `df_t = N - df_k` (non-clustered) or `df_t = G - 1` (clustered).
- **GLMs**: use z-statistics (normal approximation).

For multi-way clustering:

- **Two-way**: `df_t = min(G_1 - 1, G_2 - 1)`
- **Three-way**: `df_t = min(G_1 - 1, G_2 - 1, G_3 - 1)` *(not currently supported)*

See [this implementation](https://github.com/py-econometrics/pyfixest/blob/864da9c0d1797aff70e3f5b420e4c73f7256642d/pyfixest/estimation/feols_.py#L851) for details.

# In Code

All of the above logic is implemented [here](https://github.com/py-econometrics/pyfixest/blob/69acf9d22eab4300853d80264ee6d01bc4bdcb35/pyfixest/utils/utils.py#L108).
