Data Requirements
Data Requirements
Data provided to ReplayBG must comply to strict format requirements and should be selected following some best practices.
Format requirements
Format requirements depend on the selected blueprint
, i.e., single-meal or multi-meal, as such, in the following, details about these aspects are presented for each possible blueprint separately.
Single Meal blueprint
By "single meal" one can refer to a specific period of time when a specific subject had only 1 main meal and a corresponding insulin basal-bolus administration. Usually, this period of time spans maximum 6/8 hours, starts near such main meal, and ends just before the subsequent main meal and/or after a reasonable amount of time.
data
must be saved in a .csv
file and contain the following columns:
t
: the timestamps when data of the corresponding row were recorded (formatDD-MMM-YYYY HH:mm:SS
for example20-Dec-2013 10:35:00
). The sampling grid defined by thet
column must be homogeneous, e.g., have a datapoint every 5 minutes.glucose
: the glucose concentration (mg/dl) att
. Can contain NaN values.cho
: the meal intake (g/min) att
. Can't contain NaN values. If no meals were recorded att
just put0
there.bolus
: the insulin bolus (U/min) administered att
. Can't contain NaN values. If no insulin boluses were administered att
just put0
there.basal
: the basal insulin (U/min) administered att
. Can't contain NaN values. If no basal insulin was administered att
just put0
there.bolus_label
: the type ofbolus
at timet
. This column Eachbolus
entry > 0 must have a label defined. Can beB
if it is the bolus of a breakfast.L
if it is the bolus of a lunch.D
if it is the bolus of a dinner.C
if it is a corrective bolus.S
if it is the bolus of a snack.
If other columns are present in your data file, they will be ignored.
NOTE
The total length of the simulation, simulation_length
, is defined in minutes and determined by ReplayBG automatically using the t
column of data
and the yts
input parameter provided to the ReplayBG
object builder.
For example, if yts
is 5
minutes and t
starts from 20-Dec-2013 10:36:00
and ends to 20-Dec-2013 10:46:00
, then simulation_length
will be 10
.
Tips
If bolus_label
is not important for you (e.g., you do not plan to use it during replay) or if you do not need that, just add an empty bolus_label
column.
Warning
If more than 1 meal are present in the provided file, ReplayBG will consider the first meal as "main" meal. The others will be considered as "other" meals. The resulting kabs
and
Requirements during replay
When replaying (using the replay
method), the following requirements are no more valid under the following circumstances:
glucose
: during replay this is simply ignored.cho
: ifcho_source
isgenerated
since the CHO event will be generated by the provided handler during the replay simulation.bolus
andbolus_label
: ifbolus_source
isdss
since the insulin bolus events will be generated by the provided handler during the replay simulation.basal
: ifbasal_source
isdss
since the basal insulin will be generated by the provided handler during the replay simulation.
Multi Meal blueprint
By "multi meal" one can refer to a specific period of time when a specific subject had more than 1 main meal and a corresponding insulin basal-bolus administration regimen. One can think to such period of time by thinking to a day, when multiple meals occur, or even multiple days.
data
must be saved in a .csv
file and contain (at least) the following columns:
t
: the timestamps when data of the corresponding row were recorded (formatDD-MMM-YYYY HH:mm:SS
for example20-Dec-2013 10:35:00
). The sampling grid defined by thet
column must be homogeneous.glucose
: the glucose concentration (mg/dl) att
. Can contain NaN values.cho
: the meal intake (g/min) att
. Can't contain NaN values. If no meals were recorded att
just put0
there.bolus
: the insulin bolus (U/min) administered att
. Can't contain NaN values. If no insulin boluses were administered att
just put0
there.basal
: the basal insulin (U/min) administered att
. Can't contain NaN values. If no basal insulin was administered att
just put0
there.cho_label
: the type ofcho
at timet
. Eachcho
entry > 0 must have a label defined. Can beB
if it is a breakfast.L
if it is a lunch.D
if it is a dinner.H
if it is a hypotreatment.S
if it is a snack.
bolus_label
: the type ofbolus
at timet
. Eachbolus
entry > 0 must have a label defined. Can beB
if it is the bolus of a breakfast.L
if it is the bolus of a lunch.D
if it is the bolus of a dinner.C
if it is a corrective bolus.S
if it is the bolus of a snack.
If other columns are present in your data file, they will be ignored.
Warning
The cho
and bolus
columns must contain at least one event when twinning.
Tips
If bolus_label
is not important for you (e.g., you do not plan to use it during replay) or if you do not need that, just add an empty bolus_label
column.
Tips
A representative data file of a single meal blueprint can be found in example/data/multi-meal_example.csv
Requirements during replay
When replaying (using the replay
method), the following requirements are no more valid under the following circumstances:
glucose
: during replay this is simply ignored.cho
andcho_label
: ifcho_source
isgenerated
since the CHO events will be generated by the provided handler during the replay simulation.bolus
andbolus_label
: ifbolus_source
isdss
since the insulin bolus events will be generated by the provided handler during the replay simulation.basal
: ifbasal_source
isdss
since the basal insulin will be generated by the provided handler during the replay simulation.
Best practices
The potential ReplayBG user should be aware of several practical aspects and be careful when selecting the portion of data to work with. Here's the details.
Starting point
The twinning procedure of ReplayBG does not estimate the states of the blueprint mathematical model and, more importantly, the corresponding initial conditions. Identifying such initial conditions is crucial to correctly estimating the unknown model parameter vector
Tips
When twinning and replaying intervals (using the twin
and the replay
methods, respectively) instead of single portions of data, things change. Indeed, the problem of the starting point applies only to the first portion where we must assume initial steady state conditions. The other subsequent portions will start from the immediate next datapoint with initial conditions defined by x0
and previous_data_name
.
For more information on the x0
and previous_data_name
parameters when twinning and replaying please refer to the Twinning Procedure and the Replaying pages.
Minimum data length
As a rule of thumb we suggest to use portions of data that span at least 6 hours. As demonstrated in the literature, this ensures to obtain better parameter estimates and simulation results.
Data gaps
To make the twinning procedure more reliable, data portions having significant data gaps (i.e., more that 10% of missing glucose readings) or without a single reported meal intake or insulin bolus, should be discarded to avoid the creation of digital twins not representing the actual underneath physiology.