Setting up an Azure Data Lake and Azure Data Factory using Powershell

Login-AzureRmAccount

#first ensure that you have an Azure Data Lake that you want to use for ODX
#$resourceGroups = Get-AzureRmResourceGroup
#$azureDataLakeNames = “”;

# foreach ($resourceGroup in $resourceGroups) {
# $azureDataLake = Get-AzureRmDataLakeStoreAccount -ResourceGroupName $resourceGroup.ResourceGroupName
#$azureDataLake
# $azureDataLakeName = $azureDataLake.Name
# $azureDataLakeNameLength = $azureDataLakeName.Length
# $azureDataLakeNameLength -gt 0
# if ($azureDataLakeNameLength -gt 0) {
# $azureDataLakeNames += ” ” + $azureDataLake.Name + ” (resource group: ” + $resourceGroup.ResourceGroupName + ” & location: ” + $resourceGroup.Location + “)”
# }
# }
# “———–”
#”DataLakeNames: ” + $azureDataLakeNames
#—————————————————————————————-
#—————————————————————————————-
#REQUIRED: you must enter a unique appname which will be used as the security principal
$appname = “sune”
#OPTIONAL: change the password for the security principal password
$password = “Xyzpdq”
#run the above script, and replace DATALAKESTORENAME with the appropriate name/rg/location from your existing data lake store; or enter a new name to have a data lake created
$dataLakeStoreName = “sunelake”
$odxResourceGroup = “odxDemo”
$dataLakeLocation = “Central US” #Central US, East US 2, North Europe
#recommended to use the same resource group as the data factory for simplicity, but you can use any resource group or enter a new name to create
$dataFactoryResourceGroup = $dataLakeStoreResourceGroup
#specify where you want your data factory – current options are East US, North Europe, West Central US, and West US
$dataFactoryLocation = “West US”
#—————————————————————————————-
#—————————————————————————————-

#create odxResourceGroup, if it does not exist
Get-AzureRmResourceGroup -Name $odxResourceGroup -ErrorVariable notPresent1 -ErrorAction 0
if ($notPresent1)
{
New-AzureRmResourceGroup -Location $dataLakeLocation -Name $odxResourceGroup
}

#create data lake, if it does not exist
Get-AzureRmDataLakeStoreAccount -Name $dataLakeStoreName -ErrorVariable notPresent2 -ErrorAction 0
if ($notPresent2)
{
New-AzureRmDataLakeStoreAccount -Location $dataLakeLocation -Name $dataLakeStoreName -ResourceGroupName $odxResourceGroup
}

$homepage = “https://ODXPS.com/” + $appname

#create security principal, if it does not exist
$app = New-AzureRmADApplication -DisplayName $appname -HomePage $homepage -IdentifierUris $homepage -Password $password
$app = Get-AzureRmADApplication -DisplayName $appname

$servicePrincipal = New-AzureRmADServicePrincipal -ApplicationId $app.ApplicationId
Start-Sleep 10
New-AzureRmRoleAssignment -RoleDefinitionName “Contributor” -Id $servicePrincipal.Id -ResourceGroupName $odxResourceGroup
New-AzureRmRoleAssignment -RoleDefinitionName “Data Factory Contributor” -Id $servicePrincipal.Id -ResourceGroupName $odxResourceGroup
New-AzureRmRoleAssignment -RoleDefinitionName “Reader” -Id $servicePrincipal.Id -ResourceGroupName $odxResourceGroup

#Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path / -AceType User -Id $app.ApplicationId -Permissions All
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path / -AceType User -Id $servicePrincipal.Id -Permissions All
Get-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path /ODX -ErrorVariable notPresent3 -ErrorAction 0
if ($notPresent3)
{
New-AzureRmDataLakeStoreItem -Folder -AccountName $dataLakeStoreName -Path /ODX
}
Set-AzureRmDataLakeStoreItemAclEntry -AccountName $dataLakeStoreName -Path /ODX -AceType User -Id $servicePrincipal.Id -Permissions All
#Start-Sleep 60 #there seems to be a lag between when these permissions are added and when they are applied…trying 1 minutes to start

$subscription = Get-AzureRmSubscription
$subscriptionId= ($subscription).Id
$tenantId = ($subscription).TenantId

#ensure there are permissions
#Get-AzureRmDataLakeStoreItemAclEntry -Account $dataLakeStoreName -Path /

#get information on datalake
$dataLake = Get-AzureRmDataLakeStoreAccount -Name $dataLakeStoreName

#here is a printout
“—————————————————————”
“—————————————————————”
$text1= “Azure Data Lake Name: ” + $dataLakeStoreName + “`r`n” +
“Tenant ID: ” + $tenantId + “`r`n” +
“Client ID: ” + $app.ApplicationId + “`r`n” +
“Client Secret: ” + $password + “`r`n” +
“Subscription ID: ” + $subscriptionId + “`r`n” +
“Resource Group Name: ” + $odxResourceGroup + “`r`n” +
“Data Lake URL: adl://” + $dataLake.Endpoint + “`r`n” +
“Location: ” + $dataFactoryLocation
“—————————————————————”
“—————————————————————”

Out-File C:\Users\MattDyor\Desktop\DataLake.ps1 -InputObject $text1

This is the Azure Powershell You Are Looking For

https://github.com/Azure/azure-powershell/releases/tag/v3.7.0-March2017

I was getting an error telling me to login to Azure even after I had just logged in:

Run Login-AzureRmAccount to login

There was something out of alignment with whatever version of PowerShell and Azure I had installed. After installing this version of powershell, I was up and running in no time. I read a number of other articles that told me to do things like Update-Module, and that did not work for me…but your mileage may vary.

Good luck!

Matt

 

Deleting Multiple Items on Azure

Probably for good reason, there is no easy way to delete a bunch of items from the Azure portal. This means drilling into each of the different items, clicking on delete, possibly confirming something by typing its name into a confirmation screen, waiting a minute for the operation to complete, and then advancing to the next. Totally acceptable for production assets, where you may seriously regret deleting the wrong database (and there is no undo!).

But, when you are developing on Azure, you end up creating a ton of assets, and deleting them one by one is slow (and you actually may go on auto-pilot and end up deleting something that DOES matter). Even worse, some assets do not have a delete option in the portal (hello ADF V2…still preview, so I get it).

The trick to deleting a bunch of items is to use a resource group. Start by navigating to one of the items that you want to delete, click “change” next to resource group, and then you have the option to move a number of items into a new resource group called…trash or something like that. Once all of the items have finished moving to the new resource group, you can delete everything in once fell swoop by deleting the resource group. Pretty clever, huh:).

Have any tips and tricks to share on the Azure portal? I would love to hear them.

Matt

Add New Related Table for Entity Framework

If you are building an ASPNET MVC application (or Rails, or CakePHP) that leverages migrations, the productive speed is pretty amazing…until you do something that is not on the critical path. One area that I THINK falls in this category is adding a NEW table that an EXISTING table will use as a foreign key (e.g., the EXISTING table needs to add a reference to the NEW table).

Here is how I did it.

  • Create the NEW model / class / table
  • Update the model for the EXISTING table having a reference to the NEW table
  • NOTE: do NOT add a reference from the EXISTING table to the NEW table, because this will break referential integrity tests
  • Create a migration for that NEW table and updated EXISTING table
    • Update the migration to specify that the default value for foriegn key (e.g., typically it is 0 for ASPNET MVC, and my first record will be a value of 1, so I specified a default value of 1)
  • Run the migration so that your NEW table will appear in your database
  • Create the CRUD scaffolding for the NEW table (so that you can create a proper entity
    • you can just insert a record into the database, but if you have relationships to user accounts or other logic that may be tricky
  • Using the web interface, create your first entry for your NEW table
    • You will need to do this for each environment…dev/test/production…where you want it to apply; if you just push the final solution to production, you will get an error until you manually create the first record and populate the reference to the first record
    • NOTE: my favorite approach for updating different environments is to point my local dev instance at the production database, run the migrations, and even run the web app – not good for significant implementations, but super fast for smaller projects
    • If needed, manually update your database for the EXISTING table so that all of them point to that newly created record
  • Now add the reference from the NEW table to the EXISTING table, create another migration, and update the database again

It seems a bit hacky, particularly because you cannot just deploy. But, given that it took me longer to write it up than to do it, I went with the approach.

If you have a better approach, I would love to hear it.

Regards,

Matt

Why does Google care so much about bounce rate?

Google significantly promotes this statistic – the bounce rate – making it available in most of its reports and even on the accounts overview page (the only other statistic shown is the number of sessions). Google focusing on bounce rate makes a bit of sense: Google sells recommendations for a living through its advertising platforms, and a high bounce rate means that it is doing a rotten job of matching consumer intent with the advertiser content.

So what is bounce rate?

Bounce rate measures the percentage of visitors who visit one and only one page of your website.

Single Page Visitors            Total Visitors – Multiple Page Visitors

————————-   =   ———————————————— = Bounce Rate

Total Visitors                                   Total Visitors

 

Why provide two very similar equations for bounce rate? I wanted to highlight the easiest way to improve your bounce rate: increase the number of multiple page visitors. Preventing visitors from bouncing is hard…you are trying to accomplish a negative. Getting visitors to visit a 2nd page is a lot more actionable. To improve your bounce rate, all you need to do is get more people to click on a second page. Read that one twice. Go find the pages that are causing most of your bounce problems and see whether you have made it appealing to visit a second page. Are there too many navigational elements? Are you providing your entire website on the page? Once you start thinking about how to get people to visit a 2nd page, you will find some easy ways to improve your bounce rate.

Are there better metrics?

You bet. Having a low bounce rate as a goal is like looking to purchase a car that does not explode: it is a good minimum standard, but you should aim higher. Ideally, your metrics will include a visitor purchase where money changes hands. But starting with measuring purchase activity may not provide enough insight in what is happening between that first visit and the eventual purchase. According to Google Analytics, 98% of visitors NEVER make a purchase, so figuring out where along the path customers are dropping off can help you identify were you should focus your efforts.

Why does Google obsess with bounce rate?

The genius of bounce rate is that it is the easiest “quality” metric to measure on the planet. There is no need to ask what a consumer actually wanted, or to ask an advertiser about its campaign goals. Bounce rate allows you to abstract away all of the complexity of a real-world business relationship and ask a simple question: did the visitor click on a link after following our recommendation to visit the site?

Businesses should take time to define more meaningful goals, and revisit their goals from time to time. Typical goals will include activation (e.g., visiting a certain number of pages, reading a white paper, revisiting the site), signup (e.g., for a newsletter, a trial, or a demo), and purchase.

Matt

Note: I published this article for Payboard a while back, and I am republishing it here since the Payboard blog is no more.

Skipping Over Quill Buttons with Tabs

I am a big fan of Quill. You can add it to your website, and in minutes have rich text capability (like adding links to the Quill website to your text).

I am also a big, dare I say HUGE fan of tabbing when I am filling out forms. By default, Quill controls (like bold, italic, link, etc) accept focus when you tab, so if a user is filling out and tabbing through a form, when they get to the rich text editor they have to (aaahh!) grab their mouse and navigate into the rich text box.

The short answer is to use jQuery to add a tabeindex -1 (do not allow tab focus) to every button within the .ql-toolbar class, like this:

$(“.ql-toolbar”).find(“:button”).attr(‘tabindex’, ‘-1’);

For good measure, here is the complete Quill code that I am using. It also handles tabbing out of the editor (so when you hit tab you go to the next field). Let me know if you know how to make this better. Thanks!

Matt