
Problem Type: classification since we’re classifying each segment of time to see if someone is awake or not

Input: wrist accelerometer data

Output: the times when a user wakes up AND the times when a user sleeps

Eval Metric: Average Precision

  • note: from (1st)‘s solution, they say that:
    • The competition’s evaluation metric doesn’t differentiate predictions within a 30-second range from the ground truth event.
    • so your prediction has a margin of error of 30seconds and it’ll be correct


Detect sleep onset and wake from wrist-worn accelerometer data

Note: if you submitted multiple predictions for the same sleep event, you’ll get the same score (or better)


  • (1st) - very smart postprocsessing to make fuzzy predictions precise

  • (2nd) - Two stage predictions: use LGBM to sharpen results from the first model (rather than smart postprocessing)

    • used enssembling similar to Weighted Boxes Fusion (WBF) ensembling
      • code
        # ensemble after 2nd stage
        def weighted_fusion_ensemble(df_0, df_1, distance_threshold=100):
            weight_wo_fusion = 0.5
            large_val = 1e8
            series_ids = df_0['series_id'].unique()
            out_df = []
            for series_id in series_ids:
                df_0_id = df_0[df_0['series_id']==series_id].copy()
                df_1_id = df_1[df_1['series_id']==series_id].copy()
                df_0_id = df_0_id.sort_values("score", ascending=False).reset_index(drop=True)
                df_1_id = df_1_id.sort_values("score", ascending=False).reset_index(drop=True)
                steps_0 = df_0_id['step'].values.copy() # base
                steps_1 = df_1_id['step'].values.copy()
                scores_0 = df_0_id['score'].values.copy() # base
                scores_1 = df_1_id['score'].values.copy()
                not_assigned_df = []
                for step, score in zip(steps_1, scores_1):
                    dists = np.abs(steps_0 - step)
                    argmin = np.argmin(dists)
                    min_dist = dists[argmin]
                    if min_dist < distance_threshold:
                        f_step = steps_0[argmin]
                        f_score = scores_0[argmin]
                        add_step = step
                        add_score = score
                        # new_score = (f_score + add_score) / 2
                        new_score = (f_score * f_score + add_score * add_score) / (f_score + add_score)
                        new_step = (f_step * f_score + add_step * add_score) / (f_score + add_score)
                        df_0_id.loc[argmin, "score"] = new_score
                        df_0_id.loc[argmin, "step"] = new_step
                        steps_0[argmin] = large_val # large val to avoid assign again
                        not_assigned = df_1_id[df_1_id['step']==step].copy()
                        not_assigned['score'] = score * weight_wo_fusion # not assigned
                df_0_id.loc[steps_0!=large_val, "score"] *= weight_wo_fusion # not assigned
                if len(not_assigned_df) >0:
                    not_assigned_df = pd.concat(not_assigned_df)
            out_df = pd.concat(out_df).reset_index(drop=True) # .reset_index() # .rename(columns={"index": "row_id"})
            return out_df
    • After getting the score for each step, he uses a LGBM model to predict the scores for the steps (but the steps are shifted by a bit)
      • this is so he can get more predictions that are nearby. the extra predictions won’t harm his score
    • he concats these new predictions back onto the original table from step 2
  • (3rd) - reduce granularity from 5sec to 30sec (most efficient solution). remove noise

    • We decided to divide the series into one-day sequences and reduce the granularity from 5 secs to 30 secs
      • probably to reduce model size and speed it up. They due have a 60-sec leeway with the eval metric
    • feature engineering
    • Training their models
      • Note: the model had to predict 2 targets (one for onsets and other for wakeups)
      • Target transformation: Add two steps back and one forward. (0,0,0,0,1,0,0,0 0,0,1,1,1,1,0,0)
        • prob cause just predicting on one time step is very hard
      • loss : cross-entropy loss
      • A good augmentation trick was to reverse all the series during training, this allowed us to have more sequences and increased our local validation by 0.01
        • code
        	if ADD_INVERT_SERIES and MODE=='train':
        		num_array_flip = np.flip(num_array_, axis=1).copy()
        		target_array_flip = np.flip(np.flip(target_array_, axis=1), axis=2)
        		mask_array_flip = np.flip(mask_array_, axis=1)
        		pred_use_array_flip = np.flip(pred_use_array_, axis=1)
        • They flipped on axis=1. probably cause axis=0 is for each training instance
      • Ok. I guess this is an ok thing to do since they are NOT extrapolating the future, they are merely identifying onset/wakeup times, so it’s good for the model to see data points in reverse time.
    • important considerations:
      • used Rolling_mean(center=True) to smooth the predictions
      • Take the highest predictions every certain distance (this allows us to eliminate false positives)
        • I COULDN’T find this logic in the github (at a glance)
      • We decided to create sequences of days starting at 17:00 local time. If one day it was not complete at the beginning or at the end, we added padding.
      • the final weighing of the models (in the ensemble) was adjusted manually based on local CV
  • (4th) - smart feature engineering. input dim: 17280 x n_features. output dim: 17280 x 2


  • In time series problems where are you identifying events within the series (not predicting future values), you can double your training data (and get better results) by reversing all the events in the time series.
  • When postprocessing is annoying (e.g. there’s a constraint where there’s only two awake/onset events a day), you can use two models:
      1. The first to give you probability distributions of where the event is
      1. The second to interpret these peaks and sharpen the result
      • Cause simply taking the max points won’t yield the best results
  • You can use Weighted Boxes Fusion (WBF) ensembling rather than taking the mean of all model outputs
  • Making anglez absolute can give you a boost in CV