Utils & Metrics

_clip

pyldl.algorithms.utils._clip(func)

_reduction

pyldl.algorithms.utils._reduction(func)

binaryzation

pyldl.algorithms.utils.binaryzation(D: ndarray, method='threshold', param: any = None) ndarray

Transform label distribution matrix to logical label matrix.

Parameters:
  • D (np.ndarray) – Label distribution matrix (shape: \([n,\, l]\)).

  • method ({'threshold', 'topk'}, optional) –

    Type of binaryzation method, defaults to ‘threshold’. The options are ‘threshold’ and ‘topk’, which can refer to:

    [BIN-KWT+24]

    Zhiqiang Kou, Jing Wang, Jiawei Tang, Yuheng Jia, Boyu Shi, and Xin Geng. Exploiting multi-label correlation in label distribution learning. In Proceedings of the International Joint Conference on Artificial Intelligence, 4326–4334. 2024. URL: https://doi.org/10.24963/ijcai.2024/478.

  • param (any, optional) – Parameter of binaryzation method, defaults to None. If None, the default value is .5 for ‘threshold’ and \(\lfloor l / 2 \rfloor\) for ‘topk’.

Returns:

Logical label matrix (shape: \([n,\, l]\)).

Return type:

np.ndarray

kl_divergence

pyldl.algorithms.utils.kl_divergence(D, D_pred)

Kullback-Leibler divergence. It is defined as:

\[\text{KLD}(\boldsymbol{u}, \, \boldsymbol{v}) = \sum^l_{j=1}u_j \ln \frac{u_j}{v_j}\text{.}\]

pairwise_cosine

pyldl.algorithms.utils.pairwise_cosine(X: ndarray | Tensor, Y: ndarray | Tensor | None = None, mode: str = 'similarity') ndarray | Tensor

Pairwise cosine distance/similarity.

Parameters:
  • X (tf.Tensor) – Matrix \(\boldsymbol{X}\) (shape: \([m_{\boldsymbol{X}},\, n]\)).

  • Y (tf.Tensor) – Matrix \(\boldsymbol{Y}\) (shape: \([m_{\boldsymbol{Y}},\, n]\)).

  • mode (str) – Defaults to ‘similarity’. The options are ‘similarity’ and ‘distance’.

Returns:

Pairwise cosine similarity (shape: \([m_{\boldsymbol{X}},\, m_{\boldsymbol{Y}}]\)).

Return type:

tf.Tensor

pairwise_euclidean

pyldl.algorithms.utils.pairwise_euclidean(X: ndarray | Tensor, Y: ndarray | Tensor | None = None) ndarray | Tensor

Pairwise Euclidean distance.

Parameters:
  • X (Union[np.ndarray, tf.Tensor]) – Matrix \(\boldsymbol{X}\) (shape: \([m_{\boldsymbol{X}},\, n]\)).

  • Y (Union[np.ndarray, tf.Tensor], optional) – Matrix \(\boldsymbol{Y}\) (shape: \([m_{\boldsymbol{Y}},\, n]\)), if None, \(\boldsymbol{Y} = \boldsymbol{X}\), defaults to None.

Returns:

Pairwise Euclidean distance (shape: \([m_{\boldsymbol{X}},\, m_Y]\)).

Return type:

Union[np.ndarray, tf.Tensor]

proj

pyldl.algorithms.utils.proj(D: ndarray) ndarray

This approach is proposed in paper [Con16].

Parameters:

D (np.ndarray) – Matrix \(\boldsymbol{D}\).

Returns:

The projection onto the probability simplex.

Return type:

np.ndarray

soft_thresholding

pyldl.algorithms.utils.soft_thresholding(A: ndarray, tau: float) ndarray

Soft thresholding operation. It is defined as \(\text{soft}(\boldsymbol{A}, \, \tau) = \text{sgn}(\boldsymbol{A}) \odot \max\lbrace \lvert \boldsymbol{A} \rvert - \tau, 0 \rbrace\), where \(\odot\) denotes element-wise multiplication.

Parameters:
  • A (np.ndarray) – Matrix \(\boldsymbol{A}\).

  • tau (float) – \(\tau\).

Returns:

The result of soft thresholding operation.

Return type:

np.ndarray

solvel21

pyldl.algorithms.utils.solvel21(A: ndarray, tau: float) ndarray

This approach is proposed in paper [CY14].

The solution to the optimization problem \(\mathop{\arg\min}_{\boldsymbol{X}} \Vert \boldsymbol{X} - \boldsymbol{A} \Vert_\text{F}^2 + \tau \Vert \boldsymbol{X} \Vert_{2,\,1}\) is given by the following formula:

\[\begin{split}\vec{x}_{\bullet j}^{\ast} = \left\{ \begin{aligned} & \frac{\Vert \vec{a}_{\bullet j} \Vert - \tau}{\Vert \vec{a}_{\bullet j} \Vert} \vec{a}_{\bullet j}, & \tau \le \Vert \vec{a}_{\bullet j} \Vert \\ & 0, & \text{otherwise} \end{aligned} \right.\text{.}\end{split}\]

where \(\vec{x}_{\bullet j}\) is the \(j\)-th column of matrix \(\boldsymbol{X}\), and \(\vec{a}_{\bullet j}\) is the \(j\)-th column of matrix \(\boldsymbol{A}\).

Parameters:
  • A (np.ndarray) – Matrix \(\boldsymbol{A}\).

  • tau (float) – \(\tau\).

Returns:

The solution to the optimization problem.

Return type:

np.ndarray

svt

pyldl.algorithms.utils.svt(A: ndarray, tau: float) ndarray

Singular value thresholding (SVT) is proposed in paper [CCS10].

The solution to the optimization problem \(\mathop{\arg\min}_{\boldsymbol{X}} \Vert \boldsymbol{X} - \boldsymbol{A} \Vert_\text{F}^2 + \tau \Vert \boldsymbol{X} \Vert_{\ast}\) is given by \(\boldsymbol{U} \max \lbrace \boldsymbol{\Sigma} - \tau, 0 \rbrace \boldsymbol{V}^\top\), where \(\boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^\top\) is the singular value decomposition of matrix \(\boldsymbol{A}\).

Parameters:
  • A (np.ndarray) – Matrix \(\boldsymbol{A}\).

  • tau (float) – \(\tau\).

Returns:

The solution to the optimization problem.

Return type:

np.ndarray

artificial

pyldl.utils.artificial(X, a=1.0, b=0.5, c=0.2, d=1.0, w1=array([[4., 2., 1.]]), w2=array([[1., 2., 4.]]), w3=array([[1., 4., 2.]]), lambda1=0.01, lambda2=0.01)

download_dataset

pyldl.utils.download_dataset(name, dataset_path)

emphasize

pyldl.utils.emphasize(D, rate=0.5, **kwargs)

load_dataset

pyldl.utils.load_dataset(name, dir='dataset')

make_ldl

pyldl.utils.make_ldl(n_samples=200, **kwargs)

plot_artificial

pyldl.utils.plot_artificial(n_samples=50, model=None, file_name=None, **kwargs)

random_missing

pyldl.utils.random_missing(D, missing_rate=0.9, weighted=False)

accuracy

pyldl.metrics.accuracy(y, y_pred)

canberra

pyldl.metrics.canberra(D, D_pred)

Canberra distance. It is defined as:

\[\text{Can.}(\boldsymbol{u}, \, \boldsymbol{v}) = \sum^l_{j=1}\frac{\left\vert u_j - v_j \right\vert}{u_j + v_j}\text{.}\]

chebyshev

pyldl.metrics.chebyshev(D, D_pred)

Chebyshev distance. It is defined as:

\[\text{Cheby.}(\boldsymbol{u}, \, \boldsymbol{v}) = \max_j \left\vert u_j - v_j \right\vert\text{.}\]

clark

pyldl.metrics.clark(D, D_pred)

Clark distance. It is defined as:

\[\text{Clark}(\boldsymbol{u}, \, \boldsymbol{v}) = \sqrt{\sum^l_{j=1}\frac{\left( u_j - v_j \right)^2}{\left( u_j + v_j \right)^2}}\text{.}\]

cosine

pyldl.metrics.cosine(D, D_pred)

Cosine similarity. It is defined as:

\[\text{Cosine}(\boldsymbol{u}, \, \boldsymbol{v}) = \frac{\sum^l_{j=1}u_j v_j}{\sqrt{\sum^l_{j=1}u_j^2}\sqrt{\sum^l_{j=1}v_j^2}}\text{.}\]

dpa

pyldl.metrics.dpa(D, D_pred)

error_probability

pyldl.metrics.error_probability(D, D_pred)

Error probability. It is defined as:

\[\text{Err. prob.}(\boldsymbol{u}, \, \boldsymbol{v}) = 1 - u_{\arg\max(\boldsymbol{v})}\text{.}\]

euclidean

pyldl.metrics.euclidean(D, D_pred)

fidelity

pyldl.metrics.fidelity(D, D_pred)

intersection

pyldl.metrics.intersection(D, D_pred)

Intersection similarity. It is defined as:

\[\text{Int.}(\boldsymbol{u}, \, \boldsymbol{v}) = \sum^l_{j=1} \min\left(u_j, \, v_j\right)\text{.}\]

kendall

pyldl.metrics.kendall(D, D_pred)

Kendall’s rank correlation coefficient. It is defined as:

\[\text{Ken.}(\boldsymbol{u}, \, \boldsymbol{v}) = \frac{2 \sum_{j < k} \text{sgn}(u_j - u_k) \text{sgn}(v_j - v_k) }{l (l-1)}\text{.}\]

match_m

pyldl.metrics.match_m(D, D_pred, m=None)

max_roc_auc

pyldl.metrics.max_roc_auc(D, D_pred)

mean_absolute_error

pyldl.metrics.mean_absolute_error(D, D_pred, mode='macro')

mean_squared_error

pyldl.metrics.mean_squared_error(D, D_pred, mode='macro')

precision

pyldl.metrics.precision(y, y_pred)

score

pyldl.metrics.score(target: ndarray, pred: ndarray, metrics: list | None = None, return_dict: bool = False)

sensitivity

pyldl.metrics.sensitivity(y, y_pred)

sorensen

pyldl.metrics.sorensen(D, D_pred)

spearman

pyldl.metrics.spearman(D, D_pred)

Spearman’s rank correlation coefficient. It is defined as:

\[\text{Spear.}(\boldsymbol{u}, \, \boldsymbol{v}) = 1 - \frac{6 \sum_{j=1}^{l} (\rho(u_j) - \rho(v_j))^2 }{l(l^2 - 1)}\text{,}\]

where \(\rho\) is the rank of the element in the vector.

specificity

pyldl.metrics.specificity(y, y_pred)

squared_chi2

pyldl.metrics.squared_chi2(D, D_pred)

top_k

pyldl.metrics.top_k(D, D_pred, k=None, mode='f1_score')

youden_index

pyldl.metrics.youden_index(y, y_pred)

zero_one_loss

pyldl.metrics.zero_one_loss(D, D_pred)

0/1 loss. It is defined as:

\[\text{0/1 loss}(\boldsymbol{u}, \, \boldsymbol{v}) = \delta(\arg\max(\boldsymbol{u}), \, \arg\max(\boldsymbol{v}))\text{,}\]

where \(\delta\) is the Kronecker delta function.

References

[Con16]

Laurent Condat. Fast projection onto the simplex and the l1 ball. Mathematical Programming, 158(1):575–585, 2016. URL: https://doi.org/10.1007/s10107-015-0946-6.

[CY14]

Jinhui Chen and Jian Yang. Robust subspace segmentation via low-rank representation. IEEE Transactions on Cybernetics, 44(8):1432–1445, 2014. URL: https://doi.org/10.1109/TCYB.2013.2286106.

[CCS10]

Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010. URL: https://doi.org/10.1137/080738970.