Tech-Today
Hadoop MapReduce Demo
Versions:
Set the following environment variables:
For Windows
Download Hadoop 3.1.1 binaries for windows at https://github.com/s911415/apache-hadoop-3.1.0-winutils. Extract in HADOOP_HOME\bin and make sure to override the existing files.
For Ubuntu
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
The following instruction will install Hadoop as Pseudo-Distributed Operation
1.) Create the following folders:
HADOOP_HOME/tmp
HADOOP_HOME/tmp/dfs/data
HADOOP_HOME/tmp/dfs/name
2.) Set the following properties: core-site.xml and hdfs-site.xml
<property>
fs.defaultFS
hdfs://localhost:9001
</property>
<property>
</property>
core-site.xml
<property>
hadoop.tmp.dir
HADOOP_HOME/tmp
</property>
<property>
</property>
hdfs-site.xml
<property>
dfs.namenode.name.dir
file:///HADOOP_HOME/tmp/dfs/name
</property>
<property>
dfs.datanode.data.dir
file:///HADOOP_HOME/tmp/dfs/data
</property>
<property>
dfs.permissions
false
</property>
<property>
</property>
3.) Run hadoop namenode -format Don't forget the file:/// prefix in hdfs-site.xml for windows. Otherwise, the format will fail.
4.) Run HADOOP_HOME/sbin/start-dfs.xml.
5.) If all goes well, you can check the log for the web port in the console. In my case it's http://localhost:9870.
6.) You can now upload any file in the #4 URL.
Now let's try to create a project that will test our Hadoop setup. Or download an already existing one. For example this project: https://www.guru99.com/create-your-first-Hadoop-program.html. It has a nice explanation with it, so let's try. I've repackaged it into a pom project and uploaded at Github at https://github.com/czetsuya/Hadoop-MapReduce.
- Clone the repository.
- Open the hdfs url from the #5 above, and create an input and output folder.
- In input folder, upload the file SalesJan2009 from the project's root folder.
- Run Hadoop jar Hadoop-mapreduce-0.0.1-SNAPSHOT.jar /input /output.
- Check the output from the URL and download the resulting file.
To run Hadoop as standalone, download and unpack it as is. Go to our projects folder, build using maven, then run the Hadoop command below:
>$HADOOP_HOME/bin/hadoop jar target/hadoop-mapreduce-0.0.1-SNAPSHOT.jar input output
input - is a directory that should contain the csv file
output - is a directory that will be created after launch. The output file will be save here.
The common cause of problems:
- Un-properly configured core-site or hdfs-site related to data and name node?
- File / folder permission
References
- https://www.guru99.com/create-your-first-hadoop-program.html
- https://github.com/czetsuya/Hadoop-MapReduce
- https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
-
Apache Kafka
1.) Overview Apache Kafka is a distributed streaming platform. It is used for building real-time data platforms and streaming applications. In this blog, we will discuss how to install Kafka and work on some basic use cases. This article was created using...
-
How To Use Testng With Eclipse
This tutorial will teach you how to setup and run TestNG on eclipse. What you need: 1.) Download and setup eclipse-java-helios 2.) Download and extract TestNG (http://testng.org/doc/download.html) in c:\java\testng-version 2.) Install TestNG on eclipse...
-
Changing Jboss Server's Default Http Port
If you have installed jboss server in: C:\jboss-5.1.0\ then here's what you should do to change the default port (8080) to any port of your choice. 1.) Open the file, C:\jboss-5.1.0\server\default\conf\bindingservice.beans\META-INF\bindings-jboss-beans.xml...
-
Setting Up The Seam Examples In Jboss Server On A Windows Pc
I've been here before but that was a long time ago so here I am again playing with seam framework because my work requires me to. I notice there is no straightforward tutorial on how to make this so I'm making one. Download and install the following....
-
How To Order The Textboxes In An Html Document By Using Html Tag Tabindex
In Visual Studio TabIndex is a property that is accessible in each control, I thought there was no equivalent in plain html. So what I did was to manually captured the control's (example textbox) onblur event. It was working well until I found a bug,...
Tech-Today