🏖️ Kaggle Solutions

Search

SearchSearch
      • Bengali.AI Speech Recognition
      • CAFA 5 Protein Function Prediction
      • Child Mind Institute - Detect Sleep States
      • CommonLit - Evaluate Student Summaries
      • G-Research Crypto Forecasting
      • Google - Fast or Slow? Predict AI Model Runtime
      • Google QUEST Q&A Labeling
      • ICR - Identifying Age-Related Conditions
      • LANL Earthquake Prediction
      • Linking Writing Processes to Writing Quality
      • Mechanisms of Action (MoA) Prediction
      • MLB Player Digital Engagement Forecasting
      • NeurIPS 2023 - Machine Unlearning
      • Novozymes Enzyme Stability Prediction
      • Open Problems – Single-Cell Perturbations
      • RSNA 2023 Abdominal Trauma Detection
      • Stanford Ribonanza RNA Folding
      • Tweet Sentiment Extraction
      • UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN)
      • Vesuvius Challenge - Ink Detection
      • Average Precision
      • Balanced Accuracy
      • balanced logloss
      • BCELoss
      • BCEWithLogitsLoss
      • categorical cross entropy
      • CosineEmbeddingLoss
      • cross-entropy loss
      • DiceLoss
      • F-score
      • HingeLoss
      • Huber loss
      • jaccard similarity (aka Intersection Over Union)
      • Kendall Tau correlation
      • KLDivergenceLoss
      • Levenshtein distance
      • ListMLE Loss
      • log loss
      • log-likelihood
      • LogCosh
      • Loss functions
      • MAELoss
      • MAPE loss
      • marginRankingLoss
      • Mean absolute error (MAE)
      • mean column-wise mean absolute error (MCMAE)
      • mean reciprocal rank (MRR)
      • Mean Rowwise Root Mean Squared Error
      • MSELoss
      • PairwiseHingeLoss
      • Pearson's Correlation Coefficient
      • Precision
      • ranking loss functions
      • recall
      • receiver operating characteristic curve (ROC)
      • RMSE
      • Spearman's correlation Coefficient
      • substring segmentation
      • Word Error Rate
          • activation functions
          • ELU (Exponential Linear Unit)
          • Gated Linear Units (GLU)
          • Gaussian Error Linear Unit (GELU)
          • sigmoid
          • Sigmoid Linear Unit (Swish)
          • Smish
          • Softmax
          • SwiGLU
          • cutmix
          • cutout
          • mixup
          • test time augmentation (tta)
          • cluster sampling
          • DBSCAN
          • K-nearest neighbour (KNN)
          • kmeans
          • mean-shift clustering
          • TSNE
          • UMAP dimension reduction
          • Blocked Cross-Validation
          • forward chaining cross validation
          • GroupKFold
          • hold-out cross validation
          • kfold
          • stratified kfold
          • absolute positional embedding
          • adversarial validation
          • ALiBi positional encoding
          • BPE tokenizer
          • Online hard negative mining
          • Sequence bucketing
          • singular value decomposition (SVD)
          • axial attention
          • channel shuffle
          • Convolutional Block Attention Module (CBAM)
          • GPS Layers
          • GRU
          • linformers
          • LSTM
          • pointwise convolution
          • SAGEConv
          • Spectral Graph Convolutions
          • Squeeze-and-Excitation layer
          • Squeezeformer layer
          • cosine annealing LR
          • decreasing learning rate
          • differential learning rate
          • warmup learning rate
          • AlphaFold
          • alternative targets (auxiliary objective)
          • autoencoder
          • catboost
          • denoise autoencoder
          • Graph Attention Networks (GATs)
          • Graph Auto-Encoders (GAEs)
          • Graph Convolutional Network (GCN)
          • graph isomorphism network
          • graph neural networks
          • GraphSAGE
          • HuberRegressor
          • lgbm
          • linear regression
          • Message Passing GNNs (MP-GNN)
          • multi layer perceptron
          • Pyboost
          • RANSAC
          • Relational Graph Convolutional Networks (R-GCNs)
          • RoBERTa
          • segformer
          • stacking
          • TabNet
          • TabPFN
          • Temporal Fusion Transformers (TFT)
          • xgboost
          • Adam optimizer
          • Bayes' Theorem
          • Evidence lower bound (ELBO) (aka variational Lower Bound)
          • likelihood
          • Maximum Likelihood Estimation (MLE)
        • beam search decoding
        • bilinear interpolation
        • Connectionist temporal classification (CTC)
        • curse of dimensionality
        • factor analysis
        • Fast-Fourier Transform
        • GNN Positional encodings
        • gradient accumulation
        • graph laplancian
        • Graph Segment Training
        • image augmentation
        • KS statistic
        • Lasso Regression
        • Masked Language Modeling (MLM)
        • Mel frequency cepstral coefficients (MFCC)
        • Multiple Instance Learning
        • Over-smoothing
        • overfitting yourself
        • regularization
        • ridge regression
        • Short-time Fourier Transform (STFT)
        • thresholding
        • triplet mining
        • voice activity detection (VAD)
        • binary classification
        • classification
        • hierarchical label classification
        • Image Classification
        • image segmentation
        • learning to rank
        • machine translation
        • multi-class classification
        • Multi-label Classification
        • NLP
        • Node Classification Task (using GCN)
        • ordering objects in list
        • regression
        • semantic segmentation
        • signal processing
        • speech to Text (STT)
        • Time Series
        • transcription
        • Unknown Class Classification
          • encapsulate team's code in class
          • cross validation
          • select 2 best models for each fold on CV
          • identify domain shift
          • data compression
          • identifying slow feature generation
          • learn on subsets
          • resize layer to reduce dimensions
          • speedup iteration
          • use a single embedding matrix
          • lasso feature importance for ensembling
          • Weighted Boxes Fusion (WBF) ensembling
          • Add noise to denoise median statistic
          • considering features
          • create features through the ratio between different features
          • data period selection
          • dimension reduction for feature generation
          • distribution matching
          • downscale upscale examples
          • drop outliers
          • drop redundant columns
          • extract features using NLP on academic papers
          • Fibonacci window lag
          • filling training data (impute data)
          • frequency encoding
          • Identify poor data sources
          • is present bit
          • Leave-one-out encoding
          • normalize features
          • one-hot encoding
          • permutation feature importance to select features
          • placeholder for invalid values
          • reduce resolution
          • sliding window
          • Spectrogram dithering
          • subtraction to avoid dependence on mean
          • target encoding
          • text data cleaning
          • thermometer encoding
          • Training own Tokenizer
          • Transformer to compress dimensions rather than flattening
          • entropy
          • variance
          • How to understand your place on an overfitted Leaderboard
          • leaderboard probing
          • add signal to attention bias
          • create enriching features first, then mix across time
          • expert models
          • Freezing Layers
          • Gradient-Boosted Decision Tree
          • GRU head (neck) after the backbone layer
          • kenlm
          • RAPIDS SVR
          • Stochastic Weights Averaging
          • Train on external data first
          • adding epsilon to regularize
          • DropEdge
          • dropout
          • label smoothing
          • weight decay
          • binary encoded categorical ordinal targets
          • clip outputs to be within range
          • custom labels
          • derive results from logits
          • downsample and upsample output
          • drop bad targets from CV
          • hardness to predict label
          • ignore edge of output prediction
          • percentile thresholding
          • postprocess to match target distribution
          • pseudo-labeling
          • remove stray pixels
          • target scaling
          • time since an event occurred as an auxiliary target
          • use intermediate layer results (weighted)
        • custom loss
        • remove easy examples
        • remove rows where feat=x to find unknown data clusters
        • remove test data leakage
        • Rotational positional embedding
        • sanity check
        • xpos positional encoding
        • Demucs
        • ftfy
        • Numba
        • polars
        • Ray Tune
        • runpod.io
        • thefuzz (prev fuzzywuzzy)
        • vast.ai
      • All Competitions
      • Kaggle Grandmaster Tools
    Home

    ❯

    eval functions

    ❯

    log loss

    log loss

    Mar 01, 2025, 1 min read

    aka logistic loss or cross-entropy loss log loss is NOT the same as log-likelihood

    Backlinks

    • RSNA 2023 Abdominal Trauma Detection
    • cross-entropy loss
    • log loss

    Created with Quartz v4.2.3 © 2025

    • Download these notes on Github!