Quick Hit: Common Ways to Interact with Hadoop

MapReduce: geniuses only. If you are on this page, read the next option!

Pig: Short for Pig Latin. Allows you to query Hadoop like SQL. Developed by Yahoo. Easy to learn.

input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);
 -- Extract words from each line and put them into a pig bag
 -- datatype, then flatten the bag to get one word on each row
 words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
 -- filter out any words that are just white spaces
 filtered_words = FILTER words BY word MATCHES '\\w+';
 -- create a group for each word
 word_groups = GROUP filtered_words BY word;
 -- count the entries in each group
 word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word;
 -- order the records by count
 ordered_word_count = ORDER word_count BY count DESC;
 STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';

Hive: originally built by Facebook, a social networking site (you knew you would learn something). It has a SQL-like language called HiveQL. The queries are translated into MapReduce, Tez, or Spark jobs.

2 CREATE TABLE docs (line STRING);
4 CREATE TABLE word_counts AS
5 SELECT word, count(1) AS count FROM
6  (SELECT explode(split(line, '\s')) AS word FROM docs) temp
7 GROUP BY word
8 ORDER BY word;

Oozie: an orchestration framework that allows you to string together different MapReduce, Pig, and Hive jobs.



Error when you Add New Scaffold Item for ApplicationUser

Here is one that took me a couple of minutes to figure out: if you are using ASP.NET MVC, and you select Add New Scaffold Item on ApplicationUser, you will get the following error:

Severity Code Description Project File Line Column Suppression State
Error CS0453 The type ‘string’ must be a non-nullable value type in order to use it as parameter ‘T’ in the generic type or method ‘Nullable’ INV..NETCoreApp,Version=v1.0

This is caused by the scaffolding assuming that Id fields are integers, that are nullable, when the ApplicationUser Id is a string (which is not nullable). This is a quick 1-minute fix. Just go and remove all of the question marks after the string, For example:

Details(string? id)


Details(string id)

Super simple.

Until next time…


Hey Microsoft People: Anybody Have an All-Up ASP.NET Core Sample Application?

ASP.NET Core is the newest incarnation of ASP.NET. It allows developers to deploy their applications on any server, not just IIS or Azure, but also Linux. This marks a pretty significant shift for Microsoft, and I absolutely love that Microsoft is branching out.

But, I do have a simple question: has anybody provided comprehensive sample application that shows how all of these Core components work together? There are a ton of great samples on the web for building out sample apps in previous versions of ASP.NET, but there is nothing that I have found that demonstrates an end-to-end yet basic ASP.NET Core application that includes ASP.NET Core MVC and ASP.NET Identity Core.

Let’s get specific: I want to see something like a blog application with multiple authors and a blog administrator, where the users can create and edit/delete their own posts, create and edit/delete their own comments on other people’s posts, and an administrator can create, edit, and delete anything. I actually do not care what the application is, but I do want to see some navigation properties (and whether Lazy Loading works or not, and if it does not work, the right way to get related data loaded), something that leverages Identity Core to manage authentication, authorization, and content (e.g., I want to click on an author’s name, and see all of the posts that they have written).

If you want to get fancy: include minimalist testing (e.g., based on your experience, what are some smart tests to include right off the bat), use Visual Studio 2015, add some additional properties to an Application User (from the ASP.NET Identity Core), create roles and associate users with roles (e.g., admin and author), showcase how to use Scaffolding in an agile manner (e.g., as you change the data model, do you just delete the controller and then add using Scaffolding again…or what is the practice).

It feels like so much code is being developed that the communication of how to leverage the code has become highly siloed and it is difficult to stitch together the right way to do things from a comprehensive perspective. BUT, with a simple sample project (and ideally a quick write-up of how it works) would work wonders to bring more people under the MVC Core umbrella.

If this post already exists, please show me the way. If it does not yet exist, but you could create such a thing, please spend an hour writing out the bullet points or creating a screen recording, and I can convert it into a (temporarily) awesome blog post for you.

Thanks in advance.


Ps-If you do not have this magic blog post that covers ASP.NET Core that includes ASP.NET Core MVC and ASP.NET Identity Core, but you would like to see how all these technologies play together, please drop a comment!

Some useful articles:

Resource-based authorization in ASP.NET MVC: https://docs.microsoft.com/en-us/aspnet/core/security/authorization/resourcebased

Setting Up Goals in Google Analytics and Google Tag Manager

Set Up a Destination Goal

If you want to start improving the business performance of your website, you have to have a goal, sometimes called a conversion event. If your goal is a destination (e.g., a “thank you” or confirmation page), Google Analytics makes it easy. Just click on the Admin tab along the top, scroll over to View, and select Goals. Select Create New Goal, give your goal a name (e.g., Purchase), select destination, and then enter the part of the url following your domain name (e.g., if your page is http://www.mysite.com/success, just enter /success).

To make sure everything is set up properly, you can click the Verify link. If you get this message:

This Goal would have a 0% conversion rate based on your data from the past 7 days. Try a different set up.

You may have misconfigured your goal, the goal may not have been reached yet (e.g., nobody has visited that page), or the data may be working its way through the interenets (I have 12 hours between when I started firing an event to when the first completion was recorded).

Set Up an Event Goal using Google Tag Manager and Google Analytics

Occasionally, you may want to fire a goal on an event, such as a button click or a form submission that posts outside of the host domain (hello Paypal). This is a bit trickier. The best way to do it is to use a combination of Google Tag Manager and Google Analytics.

Create the Trigger

In Google Tag Manager, you first need to create a trigger for the event that you want to capture.

Triggers > New > Click > Just Links.

This is where I play a little dangerous. I do encourage waiting for tags for up to 2 seconds (2000 in the milliseconds box). This means that you give Google up to 2 seconds to record the click before following the link…a bit of delay for the user, but you will not be swallowing (as many) clicks by fast transitions. You will want to validate that the form submission works, because you may interfere with validations or other scripts that were built without expecting a delay.

The rest of the selections are basic, so we can now jump to the 2nd part of the process…

Create the Tag

Still in Google Tag Manager, create a New tag.

Tags > New > Google Analytics > Universal Analytics > Add your Google Analytics Tracking ID > Track Type = Event > Action = “BuyLinkClick” (or whatever) > Fire on Click > Select the Trigger you just created. Phew!

Now you are pushing the events to your GA account. If you head over to Google Analytics > Real-Time > Events, you should be able to see it as you click these links.

Create the Goal

Back in Google Analytics, go to Admin, scroll over to View, and click Goals. New Goal > Event > Action = “BuyLinkClick”. Odds are, if you click on Verify, you will get this message:

This Goal would have a 0% conversion rate based on your data from the past 7 days. Try a different set up.

You may have misconfigured your goal, the goal may not have been reached yet (e.g., nobody has visited that page), or the data may be working its way through the interenets (I have 12 hours between when I started firing an event to when the first completion was recorded).

Come back tomorrow to see if this goal is being tracked properly.