In this article I wan to share how we minimized our AppInsights costs for archived data to about 10% of original costs and yet we have still the ability to query the data ad-hoc.
Continuous export
Because of various legal requirements we are obligated to keep our App Insights history for quite longer period than the 90 days offered by AppInsights “classic”. We can switch our account to AppInsights “workspaces” where you can specify the storage retention…but it comes with a price. A price that is 2.64 EUR per GB for a 2 years retention. Now imagine we have in average about 6 TB of data ingested per month from all of our AppInsights accounts. That is 144 TB of data after 2 years … and an extra bill for data retention of something around 16 000 EUR / month (based on AppInsights retention pricing 0.110 EUR/GB/month).
OK… but we do not need to query those data daily. In fact in last few years it was less than five times we need to check something in the logs.
Thanks to continuous export we were able reduce the costs for the keeping the archive for App Insights.
How it works? You simple configure few settings via azure portal or CLI/PowerShell and some pipeline in Azure you don’t need to take care of will start ingesting AppInsights data to a selected azure blob storage container. You can configure what will be exported (e.g. messages, requests…), but that’s all. Format of the files is JSON and folder structure like /appInsightsAccount/type/yyyy-MM-dd/HH/guid_ID.blob.
Costs? For first generation of blob storage the 2 years retention had costs us about 2 300 EUR/month.
The second generation of blob storage offers three tiers: hot, cool and archive.
The last one seemed to be perfect for our needs and it also cut down the costs to only about 220 EUR/month for 2 years of data retention (144 TB).
Storage configuration
How we have achieved automatic tier selection? First we configured default tier for blobs to be “cool”.

Unfortunately as you can see you can not specify “archive” tier to be the default one. However you can define a life cycle policy that will cover this…

{ "rules": [{
"enabled": true,
"name": "ReduceSpending",
"type": "Lifecycle",
"definition": {
"actions": {
"baseBlob": {
"tierToCool": { "daysAfterModificationGreaterThan": 1},
"tierToArchive": { "daysAfterModificationGreaterThan": 30 },
"delete": { "daysAfterModificationGreaterThan": 800 }}
},"filters": {"blobTypes": ["blockBlob"]}}}]}
Azure Blob Storage Archive tier re-hydration
Having the data in archive tier have also some caveats. For instance: you do not have direct access to your data. You can not simply download a blob or connect it via data factory.
To access your data you need to re-hydrate it first. And it sort of makes sense. Files stored as archive are stored probably on some tape library like this. So before you can read them the tape needs to put to a reader, rewind to proper location and data needs to be written to some HDD (cool) or SSD (hot) storage. This is called re-hydration and it will costs you 0.0203 EUR/GB and ~5 EUR/10000 read operations.
Keeping the above in mind, retrieval of one day of archived App Insights data (e.g. messages) will cost you 2 880 operations and ~50 GB of data (example for our case, retrieving just traces/messages for a selected region), that is about 2.5 EUR for re-hydration of messages per one day.
Re-hydration and preparing ~1.5 TB of data of selected region and one month of period for ad-hoc query would costs us about 75 EUR.
# Sample PowerShell used to re-hydrate archive
# Please create the target azure storage prior of this script
# You might want to choose gen2 and enable data lake, see further in the article about querying
Login-AzAccount
Set-AzContext -SubscriptionName XXX
$ctx = Get-AzStorageAccount -ResourceGroupName XXX -Name XXX | select -ExpandProperty Context
Get-AzStorageBlob -Context $ctx -Container XXX -blob 'ainame_xxx/Messages/2019-03-*/*' | select -ExpandProperty Name |% {
Start-AzStorageBlobCopy -SrcContainer XXX -SrcBlob $_ -DestContainer XXX -DestBlob $_ -StandardBlobTier Hot -RehydratePriority Standard -Context $ctx }
Ad-hoc queries
Once we have our archive data re-hydrated in some new azure storage, you may want to either ingest it or attach it as external tableto an existing or new Azure Data Explorer database.
Let’s assume you have existing instance something like “aionlineadhocqueries” with one existing database with some short period of data retention:

Prepare the table and mapping
First you need to ingest one blob to prepare the target table + setup mapping.
- Start ad-hoc ingestion
- Prepare a single sample blob with a
sample JSON data from rehydrated source. Blob account can’t be data lake
gen2 as it does not support the SAS token. So in a case you have re-hydrated your source data to a data lake gen2 storage please copy & paste one file from it to some new or existing temporary blob account. Don’t worry, it’s a one time obstacle - Select the proper database you have created (e.g. AdHocQueries) and a proper name for new table, let’s use “messages201903” in our examples below
- Include link to the single sample blob including the SAS token
- Continue to “Edit schema”, see screenshot below: e.g. switch to JSON, increase JSON levels,…
- Don’t forget to copy the mapping name
- Start ingestion! 🙂

After a few moment you should be able to see the table created and populated with data via https://dataexplorer.azure.com/.
You can check the table structure and data with a simple “messages201903 | take 10” kusto query. You can check the mapping created with a query like “.show table messages201903 ingestion mappings”.
Option A: ingest data via LightIngest
Now we need to ingest rest of the data from the source blob storage. We will use LightIngest to do so. Documentation: https://docs.microsoft.com/en-us/azure/data-explorer/lightingest Package: https://www.nuget.org/packages/Microsoft.Azure.Kusto.Tools/#Run via PowerShell:
Install-Package -Name Microsoft.Azure.Kusto.Tools -ProviderName NuGet; cd (Split-Path (Get-Package -Name Microsoft.Azure.Kusto.Tools).Source); cd tools .LightIngest.exe "https://ingest-aionlineadhocqueries.westeurope.kusto.windows.net;fed=true" -database:AdHocQueries -table:messages201903 -source:"https://sourceblobaccount.blob.core.windows.net/container;ocJcdoPh72yx.......PRIMARY/SECONDARY..KEY==" -pattern:"*.blob" -format:json -ingestionMappingRef:AppInsightsMessages_mapping

Option B: use the external table
Alternatively you may
1) Prepare the external table
2) Prepare the external table mapping
3) Instead messages201903 you need to use external_table(“messages201903”)
Prepare the external table
Right click in the data explorer and prepare the one-time ingested table schema:

Run following in data explorer:
.create-or-alter external table messages201903Ext (****original columns ***)
kind=adl
dataformat=json ( h@'abfss://containerName@xxx.dfs.core.windows.net/pathOrNothing;Fxl5WPRIMARYorSECONDARYkey********************==' )
with (fileExtension = ".blob")
Example with table schema:
.create-or-alter external table messages201903Ext (message: dynamic, internal_data_id: guid, internal_data_documentVersion: real, context_application_version: string, context_data_eventTime: datetime) kind=adl…
Prepare the external table mapping
Now we need to create a mapping for the external table. We can utilize the existing mapping already created via the one-time-ingested table:
.show table messages201903 ingestion mappings
Create the schema based on the above command, e.g.
.create external table messages201903Ext json mapping "messages201903Ext_Mapping"
'[{"column":"message","path":"$['message']","datatype":"dynamic","transform":""},{"column":"internal_data_id","path":"$['internal']['data']['id']","datatype":"guid","transform":""},{"column":"internal_data_documentVersion","path":"$['internal']['data']['documentVersion']","datatype":"real","transform":""},{"column":"context_application_version","path":"$['context']['application']['version']","datatype":"string","transform":""},{"column":"context_data_eventTime","path":"$['context']['data']['eventTime']","datatype":"datetime","transform":""}]'
(note: you need to replace ‘ to ‘ from the original schema output)
Query the data
You may use web interface https://dataexplorer.azure.com/ or rich desktop client: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/tools/kusto-explorer
Download and install the Kusto.Explorer tool from https://aka.ms/ke.
Now you can use Kusto language the same way as you are used to query App Insights, e.g.
messages201903Enjoy!
| summarize count() by bin(context_data_eventTime, 1h)
| render timechart
Leave a Reply