Setting up an Azure Data Lake and Azure Data Factory using Powershell


#first ensure that you have an Azure Data Lake that you want to use for ODX
#$resourceGroups = Get-AzureRmResourceGroup
#$azureDataLakeNames = “”;

# foreach ($resourceGroup in $resourceGroups) {
# $azureDataLake = Get-AzureRmDataLakeStoreAccount -ResourceGroupName $resourceGroup.ResourceGroupName
# $azureDataLakeName = $azureDataLake.Name
# $azureDataLakeNameLength = $azureDataLakeName.Length
# $azureDataLakeNameLength -gt 0
# if ($azureDataLakeNameLength -gt 0) {
# $azureDataLakeNames += ” ” + $azureDataLake.Name + ” (resource group: ” + $resourceGroup.ResourceGroupName + ” & location: ” + $resourceGroup.Location + “)”
# }
# }
# “———–”
#”DataLakeNames: ” + $azureDataLakeNames
#REQUIRED: you must enter a unique appname which will be used as the security principal
$appname = “sune”
#OPTIONAL: change the password for the security principal password
$password = “Xyzpdq”
#run the above script, and replace DATALAKESTORENAME with the appropriate name/rg/location from your existing data lake store; or enter a new name to have a data lake created
$dataLakeStoreName = “sunelake”
$odxResourceGroup = “odxDemo”
$dataLakeLocation = “Central US” #Central US, East US 2, North Europe
#recommended to use the same resource group as the data factory for simplicity, but you can use any resource group or enter a new name to create
$dataFactoryResourceGroup = $dataLakeStoreResourceGroup
#specify where you want your data factory – current options are East US, North Europe, West Central US, and West US
$dataFactoryLocation = “West US”

#create odxResourceGroup, if it does not exist
Get-AzureRmResourceGroup -Name $odxResourceGroup -ErrorVariable notPresent1 -ErrorAction 0
if ($notPresent1)
New-AzureRmResourceGroup -Location $dataLakeLocation -Name $odxResourceGroup

#create data lake, if it does not exist
Get-AzureRmDataLakeStoreAccount -Name $dataLakeStoreName -ErrorVariable notPresent2 -ErrorAction 0
if ($notPresent2)
New-AzureRmDataLakeStoreAccount -Location $dataLakeLocation -Name $dataLakeStoreName -ResourceGroupName $odxResourceGroup

$homepage = “” + $appname

#create security principal, if it does not exist
$app = New-AzureRmADApplication -DisplayName $appname -HomePage $homepage -IdentifierUris $homepage -Password $password
$app = Get-AzureRmADApplication -DisplayName $appname

$servicePrincipal = New-AzureRmADServicePrincipal -ApplicationId $app.ApplicationId
Start-Sleep 10
New-AzureRmRoleAssignment -RoleDefinitionName “Contributor” -Id $servicePrincipal.Id -ResourceGroupName $odxResourceGroup
New-AzureRmRoleAssignment -RoleDefinitionName “Data Factory Contributor” -Id $servicePrincipal.Id -ResourceGroupName $odxResourceGroup
New-AzureRmRoleAssignment -RoleDefinitionName “Reader” -Id $servicePrincipal.Id -ResourceGroupName $odxResourceGroup

#Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path / -AceType User -Id $app.ApplicationId -Permissions All
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path / -AceType User -Id $servicePrincipal.Id -Permissions All
Get-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path /ODX -ErrorVariable notPresent3 -ErrorAction 0
if ($notPresent3)
New-AzureRmDataLakeStoreItem -Folder -AccountName $dataLakeStoreName -Path /ODX
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path /ODX -AceType User -Id $servicePrincipal.Id -Permissions All
#Start-Sleep 60 #there seems to be a lag between when these permissions are added and when they are applied…trying 1 minutes to start

$subscription = Get-AzureRmSubscription
$subscriptionId= ($subscription).Id
$tenantId = ($subscription).TenantId

#ensure there are permissions
#Get-AzureRmDataLakeStoreItemAclEntry -Account $dataLakeStoreName -Path /

#get information on datalake
$dataLake = Get-AzureRmDataLakeStoreAccount -Name $dataLakeStoreName

#here is a printout
$text1= “Azure Data Lake Name: ” + $dataLakeStoreName + “`r`n” +
“Tenant ID: ” + $tenantId + “`r`n” +
“Client ID: ” + $app.ApplicationId + “`r`n” +
“Client Secret: ” + $password + “`r`n” +
“Subscription ID: ” + $subscriptionId + “`r`n” +
“Resource Group Name: ” + $odxResourceGroup + “`r`n” +
“Data Lake URL: adl://” + $dataLake.Endpoint + “`r`n” +
“Location: ” + $dataFactoryLocation

Out-File C:\Users\MattDyor\Desktop\DataLake.ps1 -InputObject $text1

This is the Azure Powershell You Are Looking For

I was getting an error telling me to login to Azure even after I had just logged in:

Run Login-AzureRmAccount to login

There was something out of alignment with whatever version of PowerShell and Azure I had installed. After installing this version of powershell, I was up and running in no time. I read a number of other articles that told me to do things like Update-Module, and that did not work for me…but your mileage may vary.

Good luck!



Deleting Multiple Items on Azure

Probably for good reason, there is no easy way to delete a bunch of items from the Azure portal. This means drilling into each of the different items, clicking on delete, possibly confirming something by typing its name into a confirmation screen, waiting a minute for the operation to complete, and then advancing to the next. Totally acceptable for production assets, where you may seriously regret deleting the wrong database (and there is no undo!).

But, when you are developing on Azure, you end up creating a ton of assets, and deleting them one by one is slow (and you actually may go on auto-pilot and end up deleting something that DOES matter). Even worse, some assets do not have a delete option in the portal (hello ADF V2…still preview, so I get it).

The trick to deleting a bunch of items is to use a resource group. Start by navigating to one of the items that you want to delete, click “change” next to resource group, and then you have the option to move a number of items into a new resource group called…trash or something like that. Once all of the items have finished moving to the new resource group, you can delete everything in once fell swoop by deleting the resource group. Pretty clever, huh:).

Have any tips and tricks to share on the Azure portal? I would love to hear them.


Add New Related Table for Entity Framework

If you are building an ASPNET MVC application (or Rails, or CakePHP) that leverages migrations, the productive speed is pretty amazing…until you do something that is not on the critical path. One area that I THINK falls in this category is adding a NEW table that an EXISTING table will use as a foreign key (e.g., the EXISTING table needs to add a reference to the NEW table).

Here is how I did it.

  • Create the NEW model / class / table
  • Update the model for the EXISTING table having a reference to the NEW table
  • NOTE: do NOT add a reference from the EXISTING table to the NEW table, because this will break referential integrity tests
  • Create a migration for that NEW¬†table and updated EXISTING table
    • Update the migration to specify that the default value for¬†foriegn key (e.g., typically it is 0 for ASPNET MVC, and my first record will be a value of 1, so I specified a default value of 1)
  • Run the migration so that your NEW table will appear in your database
  • Create the CRUD scaffolding for the NEW table (so that you can create a proper entity
    • you can just insert a record into the database, but if you have relationships to user accounts or other logic that may be tricky
  • Using the web interface, create your first entry for your NEW table
    • You will need to do this for each environment…dev/test/production…where you want it to apply; if you just push the final solution to production, you will get an error until you manually create the first record and populate the reference to the first record
    • NOTE: my favorite approach for updating different environments is to point my local dev instance at the production database, run the migrations, and even run the web app – not good for significant implementations, but super fast for smaller projects
    • If needed, manually update your database for the EXISTING table so that all of them point to that newly created record
  • Now add the reference from the NEW table to the EXISTING table, create another migration, and update the database again

It seems a bit hacky, particularly because you cannot just deploy. But, given that it took me longer to write it up than to do it, I went with the approach.

If you have a better approach, I would love to hear it.