Monday, September 19, 2016

Python Modules and Packages





Working directory

PyLearning
├── Animals
├── foo
├── README.md
└── test


foo/
├── bar.py
├── fibo.py
├── fibo.pyc
├── __init__.py
└── __init__.pyc


test/
├── Backwards.py
├── Backwards.pyc
├── callFibo.py
├── callFibo.pyc
├── Card.py
├── Foo.py
├── Foo.pyc
├── __init__.py
├── __init__.pyc
├── mystuff.py
├── mystuff.pyc
└── support.py



fibo.py
def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print b,
        a, b = b, a+b

def fib2(n):   # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a+b
    print result

def fib3(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print b,
        a, b = b, a+b




callFibo.py

from foo.fibo import fib, fib2, fib3

print('Call to fib3()')
fib3(100)
print('\n')

print('Call to fib2()')
fib2(100)
print('\n')

print('Call to fib()')
fib(1000)
print('\n')   

print('Call to instance of fib()')
fib=fib
fib(500)

Output

Call to fib3()
1 1 2 3 5 8 13 21 34 55 89 

Call to fib2()
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]


Call to fib()
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 

Call to instance of fib()
1 1 2 3 5 8 13 21 34 55 89 144 233 377



















Wednesday, September 14, 2016

Ubuntu Port forwarding




Port forwarding







sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 9000 -j DNAT --to-destination 10.0.2.129:9000
sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 9090 -j DNAT --to-destination 10.0.2.129:9090
sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 50070 -j DNAT --to-destination 10.0.2.129:50070
sudo iptables -I FORWARD -m state -d 10.0.2.0/24 --state NEW,RELATED,ESTABLISHED -j ACCEPT





Saturday, September 10, 2016

Puppet



Puppet has 2 distributes :

1. WEBrick Puppet (Apache) - Naming of services like puppetmaster and puppetagent etc.
2. Puppet Labs (Use in this tutorial)


Puppet Server: Installing From Packages

Puppet Collections and packages

$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-xenial.deb
$ sudo dpkg -i puppetlabs-release-pc1-xenial.deb
$ sudo apt update
$ sudo apt-get install puppetserver
$ sudo systemctl start puppetserver


Puppet agent: Linux

Puppet Collections and packages

$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb
$ sudo dpkg -i puppetlabs-release-pc1-trusty.deb

$ sudo apt-get update

** Before startup install & configure agent do not forget add Puppet Server(Master) in /etc/hosts at agent side. Default name of Puppet Server is puppet so you map that name to correct IP address.

ubuntu@node1:~$ sudo vi /etc/hosts
ubuntu@node1:~$ 
ubuntu@node1:~$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb
--2016-09-10 22:52:47--  https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb
Resolving apt.puppetlabs.com (apt.puppetlabs.com)... 198.58.114.168, 2600:3c00::f03c:91ff:fe69:6bf0
Connecting to apt.puppetlabs.com (apt.puppetlabs.com)|198.58.114.168|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13652 (13K) [application/x-debian-package]
Saving to: 'puppetlabs-release-pc1-trusty.deb'

100%[================================================================================================================>] 13,652      44.3KB/s   in 0.3s   

2016-09-10 22:52:49 (44.3 KB/s) - 'puppetlabs-release-pc1-trusty.deb' saved [13652/13652]

ubuntu@node1:~$ sudo dpkg -i puppetlabs-release-pc1-trusty.deb
Selecting previously unselected package puppetlabs-release-pc1.
(Reading database ... 57362 files and directories currently installed.)
Preparing to unpack puppetlabs-release-pc1-trusty.deb ...
Unpacking puppetlabs-release-pc1 (1.1.0-2trusty) ...
Setting up puppetlabs-release-pc1 (1.1.0-2trusty) ...
ubuntu@node1:~$ sudo apt-get update
Hit https://apt.dockerproject.org ubuntu-trusty InRelease
Ign http://apt.puppetlabs.com trusty InRelease                         
Hit https://apt.dockerproject.org ubuntu-trusty/main amd64 Packages    
Ign http://archive.ubuntu.com trusty InRelease                    
Get:1 https://apt.dockerproject.org ubuntu-trusty/main Translation-en
Get:2 http://apt.puppetlabs.com trusty Release.gpg [841 B]             
Ign https://apt.dockerproject.org ubuntu-trusty/main Translation-en    
Get:3 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
Get:4 http://apt.puppetlabs.com trusty Release [54.2 kB]   
Get:5 http://archive.ubuntu.com trusty-security InRelease [65.9 kB]
Get:6 http://apt.puppetlabs.com trusty/PC1 amd64 Packages [15.6 kB]
Hit http://archive.ubuntu.com trusty Release.gpg                      
Get:7 http://archive.ubuntu.com trusty-updates/main amd64 Packages [889 kB]
Get:8 http://archive.ubuntu.com trusty-updates/restricted amd64 Packages [15.9 kB]
Get:9 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [373 kB]
Get:10 http://archive.ubuntu.com trusty-updates/multiverse amd64 Packages [14.8 kB]
Ign http://apt.puppetlabs.com trusty/PC1 Translation-en                 
Get:11 http://archive.ubuntu.com trusty-updates/main Translation-en [431 kB]
Get:12 http://archive.ubuntu.com trusty-updates/multiverse Translation-en [7661 B]
Get:13 http://archive.ubuntu.com trusty-updates/restricted Translation-en [3699 B]
Get:14 http://archive.ubuntu.com trusty-updates/universe Translation-en [197 kB]
Get:15 http://archive.ubuntu.com trusty-security/main amd64 Packages [524 kB]  
Get:16 http://archive.ubuntu.com trusty-security/restricted amd64 Packages [13.0 kB]
Get:17 http://archive.ubuntu.com trusty-security/universe amd64 Packages [136 kB]
Get:18 http://archive.ubuntu.com trusty-security/multiverse amd64 Packages [4990 B]
Get:19 http://archive.ubuntu.com trusty-security/main Translation-en [288 kB]  
Get:20 http://archive.ubuntu.com trusty-security/multiverse Translation-en [2570 B]
Get:21 http://archive.ubuntu.com trusty-security/restricted Translation-en [3206 B]
Get:22 http://archive.ubuntu.com trusty-security/universe Translation-en [81.3 kB]
Hit http://archive.ubuntu.com trusty Release                                   
Hit http://archive.ubuntu.com trusty/main amd64 Packages                       
Hit http://archive.ubuntu.com trusty/restricted amd64 Packages                 
Hit http://archive.ubuntu.com trusty/universe amd64 Packages                   
Hit http://archive.ubuntu.com trusty/multiverse amd64 Packages                 
Hit http://archive.ubuntu.com trusty/main Translation-en                       
Hit http://archive.ubuntu.com trusty/multiverse Translation-en                 
Hit http://archive.ubuntu.com trusty/restricted Translation-en                 
Hit http://archive.ubuntu.com trusty/universe Translation-en                   
Fetched 3187 kB in 18s (171 kB/s)                                              
Reading package lists... Done
ubuntu@node1:~$ sudo apt-get install puppet-agent
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  puppet-agent
0 upgraded, 1 newly installed, 0 to remove and 12 not upgraded.
Need to get 15.1 MB of archives.
After this operation, 81.8 MB of additional disk space will be used.
Get:1 http://apt.puppetlabs.com/ trusty/PC1 puppet-agent amd64 1.6.2-1trusty [15.1 MB]
Fetched 15.1 MB in 0s (60.7 MB/s) 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
 LANGUAGE = (unset),
 LC_ALL = (unset),
 LC_TIME = "th_TH.UTF-8",
 LC_MONETARY = "th_TH.UTF-8",
 LC_ADDRESS = "th_TH.UTF-8",
 LC_TELEPHONE = "th_TH.UTF-8",
 LC_NAME = "th_TH.UTF-8",
 LC_MEASUREMENT = "th_TH.UTF-8",
 LC_IDENTIFICATION = "th_TH.UTF-8",
 LC_NUMERIC = "th_TH.UTF-8",
 LC_PAPER = "th_TH.UTF-8",
 LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_ALL to default locale: No such file or directory
Selecting previously unselected package puppet-agent.
(Reading database ... 57367 files and directories currently installed.)
Preparing to unpack .../puppet-agent_1.6.2-1trusty_amd64.deb ...
Unpacking puppet-agent (1.6.2-1trusty) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up puppet-agent (1.6.2-1trusty) ...
update-rc.d: warning:  start runlevel arguments (none) do not match pxp-agent Default-Start values (2 3 4 5)
update-rc.d: warning:  stop runlevel arguments (none) do not match pxp-agent Default-Stop values (0 1 6)
Processing triggers for ureadahead (0.100.0-16) ...
ubuntu@node1:~$ sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true
2016-09-10 22:54:37.681893 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
Notice: /Service[puppet]/ensure: ensure changed 'stopped' to 'running'
service { 'puppet':
  ensure => 'running',
  enable => 'true',
}
ubuntu@node1:~$ sudo /opt/puppetlabs/bin/puppet agent --test
2016-09-10 22:57:11.396529 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for node1
Info: Applying configuration version '1473523040'
Notice: Applied catalog in 0.24 seconds

Sign certificates on the CA master


nutt@nutt-pc:~/Downloads$ sudo /opt/puppetlabs/bin/puppet cert list
Warning: Facter: Could not process routing table entry: Expected a destination followed by key/value pairs, got '192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 linkdown'
  "node1" (SHA256) CE:54:74:42:95:4A:C5:44:20:90:23:26:3C:63:F0:4D:71:79:12:BC:06:CC:0A:6A:ED:DE:E4:BD:AA:77:C2:A3
nutt@nutt-pc:~/Downloads$ sudo /opt/puppetlabs/bin/puppet cert sign node1
Warning: Facter: Could not process routing table entry: Expected a destination followed by key/value pairs, got '192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 linkdown'
Signing Certificate Request for:
  "node1" (SHA256) CE:54:74:42:95:4A:C5:44:20:90:23:26:3C:63:F0:4D:71:79:12:BC:06:CC:0A:6A:ED:DE:E4:BD:AA:77:C2:A3
Notice: Signed certificate request for node1
Notice: Removing file Puppet::SSL::CertificateRequest node1 at '/etc/puppetlabs/puppet/ssl/ca/requests/node1.pem'



Master-Agent Simple Setup


Main file /etc/puppetlabs/code/environments/production/manifests/site.pp

All node effects

file {'/tmp/example-ip':                                            # resource type file and filename
  ensure  => present,                                               # make sure it exists
  mode    => '0644',                                                # file permissions
  content => "Here is my Public IP Address: ${ipaddress_eth0}.\n",  # note the ipaddress_eth0 fact
}

Specific node effects

node 'ns1', 'ns2' {    # applies to ns1 and ns2 nodes
  file {'/tmp/dns':    # resource type file and filename
    ensure => present, # make sure it exists
    mode => '0644',
    content => "Only DNS servers get this file.\n",
  }
}

node default {}       # applies to nodes that aren't explicitly defined


Resource propagate to nodes depend on their schedule or take effect them immediately with 'puppet agent --test'
As a result of above configuration, all node will have a file name 'dns' and 'example-ip' in /tmp directory

Puppet Apply (standalone run puppet file)


Optional, set environment variable:  only root user can run Puppet

PATH=/opt/puppetlabs/bin:$PATH;export PATH

docker_example.pp


include 'docker'
docker::run { 'helloworld':
  image   => 'ubuntu:precise',
  command => '/bin/sh -c "while true; do echo hello world; sleep 1; done"',
}




Troubleshooting


Some of client may have an error when run Puppet command like below:


root@node2:~/Docker/puppet# facter
2016-09-11 00:21:25.117804 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
2016-09-11 00:21:25.146927 FATAL puppetlabs.facter - unhandled exception: boost::filesystem::current_path: No such file or directory
root@node2:~/Docker/puppet# puppet agent --test
2016-09-11 00:23:03.006097 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
  what():  boost::filesystem::current_path: No such file or directory
Aborted (core dumped)

This because server do not have some library such as 'libboost-filesystem-dev'









Monday, September 5, 2016

Apache Nutch




ubuntu@node2:~$ docker exec -it hbase bash
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# useradd nutch -m -s /bin/bash
root@45883500b170:/# passwd nutch
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# exit
exit
ubuntu@node2:~$ docker exec -it --user nutch hbase bash
nutch@45883500b170:/$ 
nutch@45883500b170:/$ 
nutch@45883500b170:/$ pwd          
/
nutch@45883500b170:/$ cd
nutch@45883500b170:~$ pwd
/home/nutch
nutch@45883500b170:~$ tar xzvf /software/apache-nutch-2.3.1-src.tar.gz 
apache-nutch-2.3.1/conf/
apache-nutch-2.3.1/docs/
apache-nutch-2.3.1/docs/api/
apache-nutch-2.3.1/docs/api/org/
apache-nutch-2.3.1/docs/api/org/apache/
apache-nutch-2.3.1/docs/api/org/apache/nutch/
apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/
apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/lang/
apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/lang/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/db/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/db/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/misc/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/misc/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/request/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/request/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/response/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/response/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/resources/
...

$NUTCH_HOME/ivy/ivy.xml :

<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" />
    <dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" />


$NUTCH_HOME/conf/gora.properties :

############################
# HBaseStore properties  #
############################
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
gora.datastore.autocreateschema=true
gora.datastore.scanner.caching=1000
hbase.client.autoflush.default=false


nutch@45883500b170:~/apache-nutch-2.3.1$ ant clean
Buildfile: /home/nutch/apache-nutch-2.3.1/build.xml
Trying to override old definition of task javac
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

clean-build:
   [delete] Deleting directory /home/nutch/apache-nutch-2.3.1/build

clean-lib:

clean-dist:

clean-runtime:

clean:

BUILD SUCCESSFUL
Total time: 0 seconds
nutch@45883500b170:~/apache-nutch-2.3.1$ ant runtime
Buildfile: /home/nutch/apache-nutch-2.3.1/build.xml
Trying to override old definition of task javac
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

ivy-probe-antlib:

ivy-download:
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

ivy-download-unchecked:

ivy-init-antlib:

ivy-init:

init:
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/classes
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/release
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/test
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/test/classes

clean-lib:

resolve-default:
[ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file = /home/nutch/apache-nutch-2.3.1/ivy/ivysettings.xml
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/solr/solr-solrj/4.6.0/solr-solrj-4.6.0.jar ...
[ivy:resolve] ...........
[ivy:resolve] .............................
[ivy:resolve] . (393kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.solr#solr-solrj;4.6.0!solr-solrj.jar (4382ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.5.2/hadoop-common-2.5.2.jar ...
[ivy:resolve] .................
[ivy:resolve] ...............................
[ivy:resolve] ................
[ivy:resolve] .......................
[ivy:resolve] ........................
[ivy:resolve] ........................
[ivy:resolve] .........................
[ivy:resolve] .......................
[ivy:resolve] ............................
[ivy:resolve] ......................
[ivy:resolve] ............................
[ivy:resolve] ......................
[ivy:resolve] ............ (2894kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.5.2!hadoop-common.jar (21544ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.5.2/hadoop-hdfs-2.5.2.jar ...
[ivy:resolve] ...................................
[ivy:resolve] ...................................
[ivy:resolve] ......................................
[ivy:resolve] .......................................
[ivy:resolve] ....................................
[ivy:resolve] .......................................
[ivy:resolve] ..........................................
[ivy:resolve] .......................................
[ivy:resolve] ..................................
[ivy:resolve] ........................................
[ivy:resolve] ..................................
[ivy:resolve] .........................................
[ivy:resolve] .............................................
[ivy:resolve] ...................................
[ivy:resolve] ......................
[ivy:resolve] .........................................
[ivy:resolve] ..........................................
[ivy:resolve] .............................................
[ivy:resolve] ........................................
[ivy:resolve] .....................................
[ivy:resolve] ............. (6928kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-hdfs;2.5.2!hadoop-hdfs.jar (33894ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.5.2/hadoop-mapreduce-client-core-2.5.2.jar ...
[ivy:resolve] ....................
[ivy:resolve] .......................
[ivy:resolve] .........................
[ivy:resolve] ..............................
[ivy:resolve] ...............
[ivy:resolve] ...................
[ivy:resolve] ................. (1463kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-core;2.5.2!hadoop-mapreduce-client-core.jar (12531ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.5.2/hadoop-mapreduce-client-jobclient-2.5.2.jar ...
[ivy:resolve] .. (34kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.5.2!hadoop-mapreduce-client-jobclient.jar (1075ms)
[ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet/2.2.3/org.restlet-2.2.3.jar ...
[ivy:resolve] ......................
[ivy:resolve] ..........................
[ivy:resolve] ......................... (670kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.restlet.jse#org.restlet;2.2.3!org.restlet.jar (7877ms)
[ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet.ext.jackson/2.2.3/org.restlet.ext.jackson-2.2.3.jar ...
[ivy:resolve] ... (7kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.restlet.jse#org.restlet.ext.jackson;2.2.3!org.restlet.ext.jackson.jar (2971ms)
[ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet.ext.jaxrs/2.2.3/org.restlet.ext.jaxrs-2.2.3.jar ...
[ivy:resolve] ...................
[ivy:resolve] ............ (305kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.restlet.jse#org.restlet.ext.jaxrs;2.2.3!org.restlet.ext.jaxrs.jar (5760ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar ...
[ivy:resolve] ....................... (239kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] junit#junit;4.11!junit.jar (718ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/hsqldb/hsqldb/2.2.8/hsqldb-2.2.8.jar ...
[ivy:resolve] ............................
[ivy:resolve] .........................
[ivy:resolve] ....................


Configure Nutch

$NUTCH_HOME/runtime/local/conf/nutch-site.xml :

<configuration>
 <property>
    <name>http.agent.name</name>
    <value>Nutty Spider</value>
  </property>
  <property>
    <name>storage.data.store.class</name>
    <value>org.apache.gora.hbase.store.HBaseStore</value>
    <description>Default class for storing data</description>
  </property>
  <property>
    <name>plugin.includes</name>     <value>protocol-httpclient|urlfilter-regex|parse-(text|tika|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|indexer-elastic</value>
  </property>
  <property>
    <name>db.ignore.external.links</name>
    <value>true</value>
  </property>
  <property>
    <name>elastic.host</name>
    <value>10.0.2.41</value>
  </property>
  <property>
    <name>elastic.port</name>
    <value>9300</value>
  </property>
  <property>
    <name>elastic.cluster</name>
    <value>elasticsearch</value>
  </property>
  <property>
    <name>elastic.index</name>
    <value>nutchindex</value>
  </property>
  <property>
    <name>parser.character.encoding.default</name>
    <value>utf-8</value>
  </property>
  <property>
    <name>http.content.limit</name>
    <value>6553600</value>
  </property>
  <property>
  <name>elastic.max.bulk.docs</name>
  <value>250</value>
<description>Maximum size of the bulk in number of documents.</description>
</property>
<property>
  <name>elastic.max.bulk.size</name>
  <value>2500500</value>
  <description>Maximum size of the bulk in bytes.</description>
</property>
</configuration>


Simple test


nutch@45883500b170:~/apache-nutch-2.3.1/runtime/local/bin$ ./nutch inject ~/nutch/testseed 
InjectorJob: starting at 2016-08-30 09:48:49
InjectorJob: Injecting urlDir: /home/nutch/nutch/testseed
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 0
Injector: finished at 2016-08-30 09:48:58, elapsed: 00:00:08


Crawling the web and indexing by Elasticsearch


9300 - Elasticsearch native java port
9200 - RESTful API

nutch@45883500b170:~$ cat seed/urls.txt 
https://en.wikipedia.org
nutch@45883500b170:~$ nutch inject seed/urls.txt 
InjectorJob: starting at 2016-08-30 10:24:37
InjectorJob: Injecting urlDir: seed/urls.txt
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2016-08-30 10:24:41, elapsed: 00:00:03
nutch@45883500b170:~$ nutch generate -topN 40
GeneratorJob: starting at 2016-08-30 10:25:02
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
GeneratorJob: finished at 2016-08-30 10:25:07, time elapsed: 00:00:04
GeneratorJob: generated batch id: 1472552702-144817008 containing 1 URLs
nutch@45883500b170:~$ nutch fetch -all
FetcherJob: starting at 2016-08-30 10:25:16
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 1 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching https://en.wikipedia.org/ (queue crawl delay=5000ms)
-finishing thread FetcherThread2, activeThreads=8
-finishing thread FetcherThread6, activeThreads=6
-finishing thread FetcherThread5, activeThreads=5
-finishing thread FetcherThread7, activeThreads=7
-finishing thread FetcherThread8, activeThreads=7
-finishing thread FetcherThread4, activeThreads=4
-finishing thread FetcherThread3, activeThreads=3
-finishing thread FetcherThread1, activeThreads=2
-finishing thread FetcherThread9, activeThreads=1
-finishing thread FetcherThread0, activeThreads=0
0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues
-activeThreads=0
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 0 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
-finishing thread FetcherThread9, activeThreads=9
-finishing thread FetcherThread1, activeThreads=8
-finishing thread FetcherThread2, activeThreads=7
-finishing thread FetcherThread0, activeThreads=1
-finishing thread FetcherThread7, activeThreads=2
-finishing thread FetcherThread6, activeThreads=3
-finishing thread FetcherThread5, activeThreads=4
-finishing thread FetcherThread4, activeThreads=5
-finishing thread FetcherThread3, activeThreads=6
-finishing thread FetcherThread8, activeThreads=0
0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues
-activeThreads=0
FetcherJob: finished at 2016-08-30 10:25:32, time elapsed: 00:00:16
nutch@45883500b170:~$