Setting up Hadoop HDFS in Pseudodistributed Mode

Well, new to the big data world.

Following Appendix A in the book Hadoop: The Definitive Guide, 4th Ed, just get it to work. I’m running Ubuntu 14.04.

1. Download and unpack the hadoop package, and set environment variables in your ~/.bashrc.

Verify with:

The 2.5.2 distribution package is build in 64bit for *.so files, use the 2.4.1 package if you want 32bit ones.

2. Edit config files in $HADOOP_HOME/etc/hadoop:

3. Config SSH:
Hadoop needs to start daemons on hosts of a cluster via SSH connection. A public key is generated to avoid password input.

Verify with:

4. Format HDFS filesystem:

5. Start HDFS:

Verify running with jps command:

6. Some tests:

7. Stop HDFS:

8. If there is an error like:

Just edit $HADOOP_HOME/etc/hadoop/ and export JAVA_HOME explicitly again here. It does happen under Debian. Not knowing why the environment variable is not passed over SSH.

9. You can also set HADOOP_CONF_DIR to use a separate config directory for convenience. But make sure you have the whole directory copied from the Hadoop package. Otherwise, nasty errors may occur.

Enabling Fail2ban

Hundreds lines of log in wordpress show that, attackers are just trying passwords via xmlrpc.php. Add protection using the WP fail2ban plugin, inspired by the post here.

Coroutines in C++/Boost

Starting with 1.56, boost/asio provides asio::spawn() to work with coroutines. Just paste the sample code here, with minor modifications:

The Python in my previous article can be used to work with the code above. I also tried to write a TCP server with only boost::coroutines classes. select() is used, since I want the code to be platform independent. NOTE: with coroutines, we have only _one_ thread.

Coroutines in Python

Python 3.5 added native support for coroutines. Actually, there were several steps towards the current implementation. See Wikipedia, and it seems a bit messy to me:

  • Python 2.5 implements better support for coroutine-like functionality, based on extended generators (PEP 342).
  • Python 3.3 improves this ability, by supporting delegating to a subgenerator (PEP 380).
  • Python 3.4 introduces a comprehensive asynchronous I/O framework as standardized in PEP 3156, which includes coroutines that leverage subgenerator delegation.
  • Python 3.5 introduces explicit support for coroutines with async/await syntax (PEP 0492).

Before Python 2.5, there were only generators.

In Python 2.5, yield was refined to be an expression rather than a statement, which gave the possibility to implement a simple coroutine. But still a lot of work left for programmers to use it. For instance, a simple conroutine scheduler was required.

In Python 3.3, yield from was added to support subgenerators. Nothing to do with coroutines.

In Python 3.4, the Father of Python (Guido van Rossum) wrote a PEP himself to add an asyncio module to simplify coroutine usage in Python. An official scheduler was added. We can use @asyncio.coroutine to decorate a function. We can use yield from expressions to yield to a specific coroutine.

In Python 3.5, async/await syntax was added, borrowed from C#. The newest PEP made coroutines a native Python language feature, and clearly separated them from generators. A native coroutine now declares with async def syntax, and yield from is replaced with await expression. This removes generator/coroutine ambiguity. So in Python 3.5, coroutines used with asyncio may be implemented using the async def statement, or by using generators. Generator-based coroutines should be decorated with @asyncio.coroutine, although this is not strictly enforced. The decorator enables compatibility with async def coroutines, and also serves as documentation. See Python documents here.

The implementation can be found in this commit.

I wrote a echo server/client sample to try corutines. Server code first:

Client code here, or you can simply use telnet command:

Server output:

Client output:

With Python 3.5 on Ubuntu 16.04, we can also use async/await:

Basic Usage of Boost MultiIndex Containers

Just take a simple note here.
The Boost Multi-index Containers Library provides a class template named multi_index_container which enables the construction of containers maintaining one or more indices with different sorting and access semantics.


To use with pointer values, only limited change needed as highlighted: