Tag Archives: HDFS

Setting up Hadoop HDFS in Pseudodistributed Mode

Well, new to the big data world.

Following Appendix A in the book Hadoop: The Definitive Guide, 4th Ed, just get it to work. I’m running Ubuntu 14.04.

1. Download and unpack the hadoop package, and set environment variables in your ~/.bashrc.

Verify with:

The 2.5.2 distribution package is build in 64bit for *.so files, use the 2.4.1 package if you want 32bit ones.

2. Edit config files in $HADOOP_HOME/etc/hadoop:

3. Config SSH:
Hadoop needs to start daemons on hosts of a cluster via SSH connection. A public key is generated to avoid password input.

Verify with:

4. Format HDFS filesystem:

5. Start HDFS:

Verify running with jps command:

6. Some tests:

7. Stop HDFS:

8. If there is an error like:

Just edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh and export JAVA_HOME explicitly again here. It does happen under Debian. Not knowing why the environment variable is not passed over SSH.

9. You can also set HADOOP_CONF_DIR to use a separate config directory for convenience. But make sure you have the whole directory copied from the Hadoop package. Otherwise, nasty errors may occur.