Delete Data
Use the Python client to delete data from a Synnax cluster.
The Synnax client allows deletion of time ranges of data in any channel: after each deletion operation is complete, all future reads will no longer include the deleted data. However, it may take a while before the underlying file sizes decrease – this allows deletion operations to be served in a rapid manner and only actually collect the unwanted data when the load on the server is low.
Note the differences between deleting data and deleting a channel – once a channel is deleted, it no longer exists; whereas when some data in a channel is deleted, we can write over that time range with new data or even delete some more data. Even if an entire channel’s data is deleted, the channel is still in the database, albeit empty.
Deleting Data From a Channel
The delete
method of the client allows deletion of data (not to be confused with the
delete
method of the Channel
class, which deletes channels). To delete a chunk of
data, simply pass in the channel name(s) or key(s) and the time range to delete. As throughout
Synnax, remember that a time range is start-inclusive and end-exclusive, i.e.
data at the start time stamp is deleted and data at the end time stamp is not.
For example, to remove data in the range [00:01, 00:03)
on the timestamps
and
my_precise_tc
channels:
import synnax as sy
client = sy.Synnax(...)
# timestamps and my_precise_tc are two channels containing data.
client.delete(
["my_index_timestamps", "my_precise_tc"],
sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)
Using channel name(s) to delete data will delete data in all channels with the given name(s). Using keys to delete is more preferable to prevent accidental deletion!
Note that delete
is idempotent, meaning consecutive calls to delete
on overlapping
time ranges are allowed:
# no additional data deleted
client.delete(
["my_index_timestamps", "my_precise_tc"],
sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)
# 00:01 to 00:10 deleted
client.delete(
["my_index_timestamps", "my_precise_tc"],
sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(10 * sy.TimeSpan.SECOND))
)
Limitations of Deletions
In some situations, delete
raises an error. If some channel keys or names do not exist
in the database, the entirety of the delete
operation fails, no data is deleted, and a
NotFound
error is returned:
# Suppose 111 and 112 are keys to channels that do exist, since 113
# does not exist, none of these channels' data get deleted.
client.delete([111, 112, 113], time_range_to_delete)
In the case where a requested channel is not found, delete
is atomic: no data will be
deleted and the operation will fail. However, in all other cases, delete
is not
atomic: failure in deleting data one channel halts the entire operation and raises an
error immediately.
In addition, if a delete
call is made to an index channel that other channels depend
on data in the requested time range, an error is raised:
# If my_precise_tc is indexed by my_index_timestamps from 1 second to 3 seconds,
# we cannot delete my_index_timestamps. This call raises an error.
client.delete(
["my_index_timestamps"],
sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)
# If we delete my_precise_tc, the dependent, at the same time as my_index_timestamps,
# no errors are raised.
client.delete(
["my_precise_tc", "my_index_timestamps"],
sy.TimeStamp(1 * sy.TimeSpan.SECOND).range(sy.TimeStamp(3 * sy.TimeSpan.SECOND))
)
Last but not least, delete
calls on any channel with a writer whose start time is
before the deleting time range raise an error. This is to ensure that the writer and the
deleter do not contend over data in the same region.
w = client.open_writer(
start= sy.TimeStamp(10 * sy.TimeSpan.SECOND),
channels=["my_precise_tc"],
)
# error raised since writer start 00:10 is before deleting time range [00:12 - 00:30)
client.delete(
["my_precise_tc"],
sy.TimeStamp(12 * sy.TimeSpan.SECOND.range(sy.TimeStamp(30 * TimeSpan.SECOND))
)
Once writers starting before the deleting time range are closed, calls to delete
may
proceed normally.