Friday, December 30, 2016

Docker Daemon



The Docker daemon can listen for Docker Remote API requests via three different types of Socket: unix, tcp, and fd

unix  :- /var/run/docker.sock  (default)

tcp :- for remote access, support tls1.0  i.e.
-H tcp://0.0.0.0:2375         # for all IP address
-H tcp://192.168.59.103:2375   # specific IP address
*** Docker daemon: 2375 (non-secure), 2376 (secure)

fd :- socket activated files  i.e.
-H fd://
-H fd://3

Can specific multiple socket like this
dockerd -H unix:///var/run/docker.sock -H tcp://192.168.59.106 -H tcp://10.10.10.2


Thursday, December 22, 2016

How to deploy scrapy project to scrapinghub.com then save crawling data to mongoDB




PIP required module :
pymongo

ubuntu@nutthaphongmail:~$ docker exec -it --user root scrapyd-server bash
root@6b82490da131:/home/scrapyd# pip install shub
Collecting shub
  Downloading shub-2.5.0-py2.py3-none-any.whl (47kB)
    100% |################################| 51kB 773kB/s 
Collecting requests (from shub)
  Downloading requests-2.12.4-py2.py3-none-any.whl (576kB)
    100% |################################| 583kB 1.3MB/s 
Requirement already satisfied: six in /usr/local/lib/python2.7/dist-packages (from shub)
Collecting scrapinghub (from shub)
  Downloading scrapinghub-1.9.0-py2-none-any.whl
Requirement already satisfied: pip in /usr/local/lib/python2.7/dist-packages (from shub)
Collecting PyYAML (from shub)
  Downloading PyYAML-3.12.tar.gz (253kB)
    100% |################################| 256kB 2.5MB/s 
Collecting click (from shub)
  Downloading click-6.6-py2.py3-none-any.whl (71kB)
    100% |################################| 71kB 4.2MB/s 
Collecting retrying (from shub)
  Downloading retrying-1.3.3.tar.gz
Collecting docker-py (from shub)
  Downloading docker_py-1.10.6-py2.py3-none-any.whl (50kB)
    100% |################################| 51kB 4.2MB/s 
Collecting backports.ssl-match-hostname>=3.5; python_version < "3.5" (from docker-py->shub)
  Downloading backports.ssl_match_hostname-3.5.0.1.tar.gz
Collecting websocket-client>=0.32.0 (from docker-py->shub)
  Downloading websocket_client-0.40.0.tar.gz (196kB)
    100% |################################| 204kB 2.7MB/s 
Requirement already satisfied: ipaddress>=1.0.16; python_version < "3.3" in /usr/local/lib/python2.7/dist-packages (from docker-py->shub)
Collecting docker-pycreds>=0.2.1 (from docker-py->shub)
  Downloading docker_pycreds-0.2.1-py2.py3-none-any.whl
Building wheels for collected packages: PyYAML, retrying, backports.ssl-match-hostname, websocket-client
  Running setup.py bdist_wheel for PyYAML ... done
  Stored in directory: /root/.cache/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc
  Running setup.py bdist_wheel for retrying ... done
  Stored in directory: /root/.cache/pip/wheels/d9/08/aa/49f7c109140006ea08a7657640aee3feafb65005bcd5280679
  Running setup.py bdist_wheel for backports.ssl-match-hostname ... done
  Stored in directory: /root/.cache/pip/wheels/5d/72/36/b2a31507b613967b728edc33378a5ff2ada0f62855b93c5ae1
  Running setup.py bdist_wheel for websocket-client ... done
  Stored in directory: /root/.cache/pip/wheels/d1/5e/dd/93da015a0ecc8375278b05ad7f0452eff574a044bcea2a95d2
Successfully built PyYAML retrying backports.ssl-match-hostname websocket-client
Installing collected packages: requests, retrying, scrapinghub, PyYAML, click, backports.ssl-match-hostname, websocket-client, docker-pycreds, docker-py, shub
Successfully installed PyYAML-3.12 backports.ssl-match-hostname-3.5.0.1 click-6.6 docker-py-1.10.6 docker-pycreds-0.2.1 requests-2.12.4 retrying-1.3.3 scrapinghub-1.9.0 shub-2.5.0 websocket-client-0.40.0
root@6b82490da131:/home/scrapyd# exit
exit
ubuntu@nutthaphongmail:~$ docker exec -it scrapyd-server bash
scrapyd@6b82490da131:~$ 
scrapyd@6b82490da131:~$ shub
Usage: shub [OPTIONS] COMMAND [ARGS]...

  shub is the Scrapinghub command-line client. It allows you to deploy
  projects or dependencies, schedule spiders, and retrieve scraped data or
  logs without leaving the command line.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  copy-eggs     Sync eggs from one project with other project
  deploy        Deploy Scrapy project to Scrapy Cloud
  deploy-egg    [DEPRECATED] Build and deploy egg from source
  deploy-reqs   [DEPRECATED] Build and deploy eggs from requirements.txt
  fetch-eggs    Download project eggs from Scrapy Cloud
  image         Manage project based on custom Docker image
  items         Fetch items from Scrapy Cloud
  log           Fetch log from Scrapy Cloud
  login         Save your Scrapinghub API key
  logout        Forget saved Scrapinghub API key
  migrate-eggs  Migrate dash eggs to requirements.txt and project's directory
  requests      Fetch requests from Scrapy Cloud
  schedule      Schedule a spider to run on Scrapy Cloud
  version       Show shub version

  For usage and help on a specific command, run it with a --help flag, e.g.:

      shub schedule --help
scrapyd@6b82490da131:~$ 
scrapyd@6b82490da131:~$ 
scrapyd@6b82490da131:~$ pwd
/home/scrapyd
scrapyd@6b82490da131:~/projects/PyLearning$ cd stack/
scrapyd@6b82490da131:~/projects/PyLearning/stack$ cat requirements.txt 
pymongo==3.4.0
scrapyd@6b82490da131:~/projects/PyLearning/stack$ 
scrapyd@6b82490da131:~/projects/PyLearning/stack$ 
scrapyd@6b82490da131:~/projects/PyLearning/stack$ 
scrapyd@6b82490da131:~/projects/PyLearning/stack$ cat scrapinghub.yml 
projects:
  default: 136494
requirements_file: requirements.txt
scrapyd@6b82490da131:~/projects/PyLearning/stack$ 
scrapyd@6b82490da131:~/projects/PyLearning/stack$ shub login ------------------------------------------------------------------------------- Welcome to shub version 2! This release contains major updates to how shub is configured, as well as updates to the commands and shub's look & feel. Run 'shub' to get an overview over all available commands, and 'shub command --help' to get detailed help on a command. Definitely try the new 'shub items -f [JOBID]' to see items live as they are being scraped! From now on, shub configuration should be done in a file called 'scrapinghub.yml', living next to the previously used 'scrapy.cfg' in your Scrapy project directory. Global configuration, for example API keys, should be done in a file called '.scrapinghub.yml' in your home directory. But no worries, shub has automatically migrated your global settings to ~/.scrapinghub.yml, and will also automatically migrate your project settings when you run a command within a Scrapy project. Visit http://doc.scrapinghub.com/shub.html for more information on the new configuration format and its benefits. Happy scraping! ------------------------------------------------------------------------------- Enter your API key from https://app.scrapinghub.com/account/apikey API key: e4bfa1fd7f8d4d9da817aa112bb82095 Validating API key... API key is OK, you are logged in now. scrapyd@6b82490da131:~/projects/PyLearning/stack$ scrapyd@6b82490da131:~/projects/PyLearning/stack$ shub deploy Target project ID: 136494 Save as default [Y/n]: Y Project 136494 was set as default in scrapinghub.yml. You can deploy to it via 'shub deploy' from now on. Packing version e7d7f6c-inet1 Deploying to Scrapy Cloud project "136494" {"spiders": 1, "status": "ok", "project": 136494, "version": "e7d7f6c-inet1"} Run your spiders at: https://app.scrapinghub.com/p/136494/

* API key from https://app.scrapinghub.com/account/apikey
** project ID found on https://app.scrapinghub.com/p/PROJECT_ID/deploy for new project









Tuesday, November 22, 2016

Deploy dataset project to scrapyd







scrapyd@a4c2642d74db:~/projects$ git clone https://github.com/nutthaphon/PyLearning.git
Cloning into 'PyLearning'...
remote: Counting objects: 323, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 323 (delta 0), reused 0 (delta 0), pack-reused 317
Receiving objects: 100% (323/323), 61.27 KiB | 0 bytes/s, done.
Resolving deltas: 100% (146/146), done.
Checking connectivity... done.
scrapyd@a4c2642d74db:~/projects$ 
scrapyd@a4c2642d74db:~/projects$ 
scrapyd@a4c2642d74db:~/projects$ 
scrapyd@a4c2642d74db:~/projects$ cd PyLearning/
scrapyd@a4c2642d74db:~/projects/PyLearning$ ls
Animals  CherryPy  DJango  README.md  Scraping  dataset  decor  foo  serial  test  tutorial
scrapyd@a4c2642d74db:~/projects/PyLearning$ cd dataset/
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ls
__init__.py  __init__.pyc  build  project.egg-info  scrapinghub.yml  scrapy.cfg  settrade  setup.py
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ sed -i 's/#url/url/g' scrapy.cfg
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ cat scrapy.cfg
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.org/en/latest/deploy.html

[settings]
default = settrade.settings

[deploy]
url = http://localhost:6800/
project = settrade
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ls
__init__.py  __init__.pyc  build  project.egg-info  scrapinghub.yml  scrapy.cfg  settrade  setup.py
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ cd settrade/
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade$ ls
DailyStockQuote.py   QueryStockSymbol.py   ThinkSpeakChannels.py   __init__.py   items.py      scrapinghub.yml  settings.pyc
DailyStockQuote.pyc  QueryStockSymbol.pyc  ThinkSpeakChannels.pyc  __init__.pyc  pipelines.py  settings.py      spiders
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade$ cd spiders/
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ ls
SettradeSpider.py  SettradeSpider.pyc  __init__.py  __init__.pyc
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ ls
SettradeSpider.py  SettradeSpider.pyc  __init__.py  __init__.pyc
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ cd ..
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade$ cd ..
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ls
__init__.py  __init__.pyc  build  project.egg-info  scrapinghub.yml  scrapy.cfg  settrade  setup.py
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -l
default              http://localhost:6800/
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy default -p dataset
Packing version 1479804720
Deploying to project "dataset" in http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "dataset", "version": "1479804720", "spiders": 1, "node_name": "a4c2642d74db"}

scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -L default
tutorial
dataset
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ curl http://localhost:6800/listprojects.json
{"status": "ok", "projects": ["tutorial", "dataset"], "node_name": "a4c2642d74db"}
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ curl http://localhost:6800/listspiders.json?project=dataset
{"status": "ok", "spiders": ["settrade_dataset"], "node_name": "a4c2642d74db"}
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ 
scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ curl http://localhost:6800/schedule.json -d project=dataset -d spider=settrade_dataset -d setting=DOWNLOAD_DELAY=5 -d start_urls=http://www.settrade.com/servlet/IntradayStockChartDataServlet?symbol=INTUCH,http://www.settrade.com/servlet/IntradayStockChartDataServlet?symbol=CPF,http://www.settrade.com/servlet/IntradayStockChartDataServlet?symbol=ADVANC
{"status": "ok", "jobid": "c60e4566b09111e68c380242ac110002", "node_name": "a4c2642d74db"}



Scrapyd deploying your project








ubuntu@node2:~/Docker/nutthaphon/scrapyd/user$ docker run --name scrapyd-server --user scrapyd -it -P nutthaphon/scrapyd:1.1.1 bash
scrapyd@a4c2642d74db:~$ pwd
/home/scrapyd
scrapyd@a4c2642d74db:~$ ls
master.zip  scrapyd-client-master  setuptools-28.8.0.zip
scrapyd@a4c2642d74db:~$ mkdir projects
scrapyd@a4c2642d74db:~$ cd projects/
scrapyd@a4c2642d74db:~/projects$ scrapy startproject tutorial
New Scrapy project 'tutorial', using template directory '/usr/local/lib/python2.7/dist-packages/scrapy/templates/project', created in:
    /home/scrapyd/projects/tutorial

You can start your first spider with:
    cd tutorial
    scrapy genspider example example.com
scrapyd@a4c2642d74db:~/projects$ cd tutorial/
scrapyd@a4c2642d74db:~/projects/tutorial$ ls
scrapy.cfg  tutorial
scrapyd@a4c2642d74db:~/projects/tutorial$ vi scrapy.cfg 
bash: vi: command not found
scrapyd@a4c2642d74db:~/projects/tutorial$ cat scrapy.cfg 
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.org/en/latest/deploy.html

[settings]
default = tutorial.settings

[deploy]
#url = http://localhost:6800/
project = tutorial
scrapyd@a4c2642d74db:~/projects/tutorial$ sed -i 's/#utl/url/g' scrapy.cfg 
scrapyd@a4c2642d74db:~/projects/tutorial$ cat scrapy.cfg 
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.org/en/latest/deploy.html

[settings]
default = tutorial.settings

[deploy]
#url = http://localhost:6800/
project = tutorial
scrapyd@a4c2642d74db:~/projects/tutorial$ sed -i 's/#url/url/g' scrapy.cfg 
scrapyd@a4c2642d74db:~/projects/tutorial$ cat scrapy.cfg 
# Automatically created by: scrapy startproject
#
# For more information about the [deploy] section see:
# https://scrapyd.readthedocs.org/en/latest/deploy.html

[settings]
default = tutorial.settings

[deploy]
url = http://localhost:6800/
project = tutorial
scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -l
default              http://localhost:6800/
scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy default -p tutorial
Packing version 1479799362
Deploying to project "tutorial" in http://localhost:6800/addversion.json
Deploy failed: <urlopen error [Errno 111] Connection refused>
scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy default -p tutorial
Packing version 1479799403
Deploying to project "tutorial" in http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "tutorial", "version": "1479799403", "spiders": 0, "node_name": "a4c2642d74db"}

scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -L default
tutorial








Wednesday, October 26, 2016

Wildfly and JSF




Checking existing JSF subsystem and default module
[jboss@69c9222702f3 bin]$ ./jboss-cli.sh 
You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands.
[disconnected /] connect 10.80.1.209:9990
Authenticating against security realm: ManagementRealm
Username: admin
Password: 
[standalone@10.80.1.209:9990 /] /subsystem=jsf/:list-active-jsf-impls
{
    "outcome" => "success",
    "result" => ["main"]
}
[standalone@10.80.1.209:9990 /] /subsystem=jsf/:read-attribute(name=default-jsf-impl-slot)
{
    "outcome" => "success",
    "result" => "main"
}
[standalone@10.80.1.209:9990 /] 
* That show you only one module installed names "main"

Deploy new one v1.2 and restart Wildfly
[standalone@10.80.1.209:9990 /] deploy /install-mojarra-1.2_15.cli
* This cli file, need to build with Maven follow steps in this site
-or- Download files from github


Verify installed module ,change default to new added and restart Wildfly

[jboss@69c9222702f3 bin]$ ./jboss-cli.sh 
You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands.
[disconnected /] connect 10.80.1.209:9990
Authenticating against security realm: ManagementRealm
Username: admin
Password: 
[standalone@10.80.1.209:9990 /] /subsystem=jsf/:list-active-jsf-impls
{
    "outcome" => "success",
    "result" => [
        "mojarra-1.2_15",
        "main"
    ]
}
[standalone@10.80.1.209:9990 /] /subsystem=jsf/:read-attribute(name=default-jsf-impl-slot)
{
    "outcome" => "success",
    "result" => "main"
}
[standalone@10.80.1.209:9990 /] /subsystem=jsf/:write-attribute(name=default-jsf-impl-slot,value=mojarra-1.2_15)
{
    "outcome" => "success",
    "response-headers" => {
        "operation-requires-reload" => true,
        "process-state" => "reload-required"
    }
}
[standalone@10.80.1.209:9990 /] 

Confirm default to JSF 1.2

[jboss@69c9222702f3 bin]$ ./jboss-cli.sh 
You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands.
[disconnected /] connect 10.80.1.209:9990
Authenticating against security realm: ManagementRealm
Username: admin
Password: 
[standalone@10.80.1.209:9990 /] /subsystem=jsf/:read-attribute(name=default-jsf-impl-slot)
{
    "outcome" => "success",
    "result" => "mojarra-1.2_15"
}


Test JSF 1.2 Web Application


Reference Site : Steps to add any new JSF implementation or version to WildFly





Tuesday, October 11, 2016

JMeter on containers





nutt@nutt-pc:~$ docker network create --driver=bridge --subnet=172.27.0.0/16 \
> --ip-range=172.27.5.0/24 \
> --gateway=172.27.5.254 \
> my-net
1721e834110c184f92ce2ae1008bf5f5dcd735f38a7d35f6bbcc9eebc4d5de8e
nutt@nutt-pc:~$ docker network inspect my-net
[
    {
        "Name": "my-net",
        "Id": "1721e834110c184f92ce2ae1008bf5f5dcd735f38a7d35f6bbcc9eebc4d5de8e",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.27.0.0/16",
                    "IPRange": "172.27.5.0/24",
                    "Gateway": "172.27.5.254"
                }
            ]
        },
        "Internal": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]


nutt@nutt-pc:~$ docker run -it --net="my-net" --add-host="jmeter-server:172.27.5.1" --mac-address="9a:17:a0:c7:b4:cb" --ip="172.27.5.1" --name jmeter-server1 ubuntu bash
root@c07696364171:/# 
root@c07696364171:/# 
root@c07696364171:/# 
root@c07696364171:/# 
root@c07696364171:/# apt-get update
Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [95.7 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial/main Sources [1103 kB]
Get:5 http://archive.ubuntu.com/ubuntu xenial/restricted Sources [5179 B]
Get:6 http://archive.ubuntu.com/ubuntu xenial/universe Sources [9802 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:8 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:9 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:10 http://archive.ubuntu.com/ubuntu xenial-updates/main Sources [244 kB]   
Get:11 http://archive.ubuntu.com/ubuntu xenial-updates/universe Sources [124 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [506 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [418 kB]
Get:14 http://archive.ubuntu.com/ubuntu xenial-security/main Sources [51.4 kB] 
Get:15 http://archive.ubuntu.com/ubuntu xenial-security/universe Sources [11.1 kB]
Get:16 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages [191 kB]
Get:17 http://archive.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [52.5 kB]
Fetched 24.3 MB in 11s (2109 kB/s)                                             
Reading package lists... Done


Create DOCKERFILE
























Monday, September 19, 2016

Python Modules and Packages





Working directory

PyLearning
├── Animals
├── foo
├── README.md
└── test


foo/
├── bar.py
├── fibo.py
├── fibo.pyc
├── __init__.py
└── __init__.pyc


test/
├── Backwards.py
├── Backwards.pyc
├── callFibo.py
├── callFibo.pyc
├── Card.py
├── Foo.py
├── Foo.pyc
├── __init__.py
├── __init__.pyc
├── mystuff.py
├── mystuff.pyc
└── support.py



fibo.py
def fib(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print b,
        a, b = b, a+b

def fib2(n):   # return Fibonacci series up to n
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a+b
    print result

def fib3(n):    # write Fibonacci series up to n
    a, b = 0, 1
    while b < n:
        print b,
        a, b = b, a+b




callFibo.py

from foo.fibo import fib, fib2, fib3

print('Call to fib3()')
fib3(100)
print('\n')

print('Call to fib2()')
fib2(100)
print('\n')

print('Call to fib()')
fib(1000)
print('\n')   

print('Call to instance of fib()')
fib=fib
fib(500)

Output

Call to fib3()
1 1 2 3 5 8 13 21 34 55 89 

Call to fib2()
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]


Call to fib()
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 

Call to instance of fib()
1 1 2 3 5 8 13 21 34 55 89 144 233 377



















Wednesday, September 14, 2016

Ubuntu Port forwarding




Port forwarding







sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 9000 -j DNAT --to-destination 10.0.2.129:9000
sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 9090 -j DNAT --to-destination 10.0.2.129:9090
sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 50070 -j DNAT --to-destination 10.0.2.129:50070
sudo iptables -I FORWARD -m state -d 10.0.2.0/24 --state NEW,RELATED,ESTABLISHED -j ACCEPT





Saturday, September 10, 2016

Puppet



Puppet has 2 distributes :

1. WEBrick Puppet (Apache) - Naming of services like puppetmaster and puppetagent etc.
2. Puppet Labs (Use in this tutorial)


Puppet Server: Installing From Packages

Puppet Collections and packages

$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-xenial.deb
$ sudo dpkg -i puppetlabs-release-pc1-xenial.deb
$ sudo apt update
$ sudo apt-get install puppetserver
$ sudo systemctl start puppetserver


Puppet agent: Linux

Puppet Collections and packages

$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb
$ sudo dpkg -i puppetlabs-release-pc1-trusty.deb

$ sudo apt-get update

** Before startup install & configure agent do not forget add Puppet Server(Master) in /etc/hosts at agent side. Default name of Puppet Server is puppet so you map that name to correct IP address.

ubuntu@node1:~$ sudo vi /etc/hosts
ubuntu@node1:~$ 
ubuntu@node1:~$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb
--2016-09-10 22:52:47--  https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb
Resolving apt.puppetlabs.com (apt.puppetlabs.com)... 198.58.114.168, 2600:3c00::f03c:91ff:fe69:6bf0
Connecting to apt.puppetlabs.com (apt.puppetlabs.com)|198.58.114.168|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13652 (13K) [application/x-debian-package]
Saving to: 'puppetlabs-release-pc1-trusty.deb'

100%[================================================================================================================>] 13,652      44.3KB/s   in 0.3s   

2016-09-10 22:52:49 (44.3 KB/s) - 'puppetlabs-release-pc1-trusty.deb' saved [13652/13652]

ubuntu@node1:~$ sudo dpkg -i puppetlabs-release-pc1-trusty.deb
Selecting previously unselected package puppetlabs-release-pc1.
(Reading database ... 57362 files and directories currently installed.)
Preparing to unpack puppetlabs-release-pc1-trusty.deb ...
Unpacking puppetlabs-release-pc1 (1.1.0-2trusty) ...
Setting up puppetlabs-release-pc1 (1.1.0-2trusty) ...
ubuntu@node1:~$ sudo apt-get update
Hit https://apt.dockerproject.org ubuntu-trusty InRelease
Ign http://apt.puppetlabs.com trusty InRelease                         
Hit https://apt.dockerproject.org ubuntu-trusty/main amd64 Packages    
Ign http://archive.ubuntu.com trusty InRelease                    
Get:1 https://apt.dockerproject.org ubuntu-trusty/main Translation-en
Get:2 http://apt.puppetlabs.com trusty Release.gpg [841 B]             
Ign https://apt.dockerproject.org ubuntu-trusty/main Translation-en    
Get:3 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
Get:4 http://apt.puppetlabs.com trusty Release [54.2 kB]   
Get:5 http://archive.ubuntu.com trusty-security InRelease [65.9 kB]
Get:6 http://apt.puppetlabs.com trusty/PC1 amd64 Packages [15.6 kB]
Hit http://archive.ubuntu.com trusty Release.gpg                      
Get:7 http://archive.ubuntu.com trusty-updates/main amd64 Packages [889 kB]
Get:8 http://archive.ubuntu.com trusty-updates/restricted amd64 Packages [15.9 kB]
Get:9 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [373 kB]
Get:10 http://archive.ubuntu.com trusty-updates/multiverse amd64 Packages [14.8 kB]
Ign http://apt.puppetlabs.com trusty/PC1 Translation-en                 
Get:11 http://archive.ubuntu.com trusty-updates/main Translation-en [431 kB]
Get:12 http://archive.ubuntu.com trusty-updates/multiverse Translation-en [7661 B]
Get:13 http://archive.ubuntu.com trusty-updates/restricted Translation-en [3699 B]
Get:14 http://archive.ubuntu.com trusty-updates/universe Translation-en [197 kB]
Get:15 http://archive.ubuntu.com trusty-security/main amd64 Packages [524 kB]  
Get:16 http://archive.ubuntu.com trusty-security/restricted amd64 Packages [13.0 kB]
Get:17 http://archive.ubuntu.com trusty-security/universe amd64 Packages [136 kB]
Get:18 http://archive.ubuntu.com trusty-security/multiverse amd64 Packages [4990 B]
Get:19 http://archive.ubuntu.com trusty-security/main Translation-en [288 kB]  
Get:20 http://archive.ubuntu.com trusty-security/multiverse Translation-en [2570 B]
Get:21 http://archive.ubuntu.com trusty-security/restricted Translation-en [3206 B]
Get:22 http://archive.ubuntu.com trusty-security/universe Translation-en [81.3 kB]
Hit http://archive.ubuntu.com trusty Release                                   
Hit http://archive.ubuntu.com trusty/main amd64 Packages                       
Hit http://archive.ubuntu.com trusty/restricted amd64 Packages                 
Hit http://archive.ubuntu.com trusty/universe amd64 Packages                   
Hit http://archive.ubuntu.com trusty/multiverse amd64 Packages                 
Hit http://archive.ubuntu.com trusty/main Translation-en                       
Hit http://archive.ubuntu.com trusty/multiverse Translation-en                 
Hit http://archive.ubuntu.com trusty/restricted Translation-en                 
Hit http://archive.ubuntu.com trusty/universe Translation-en                   
Fetched 3187 kB in 18s (171 kB/s)                                              
Reading package lists... Done
ubuntu@node1:~$ sudo apt-get install puppet-agent
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  puppet-agent
0 upgraded, 1 newly installed, 0 to remove and 12 not upgraded.
Need to get 15.1 MB of archives.
After this operation, 81.8 MB of additional disk space will be used.
Get:1 http://apt.puppetlabs.com/ trusty/PC1 puppet-agent amd64 1.6.2-1trusty [15.1 MB]
Fetched 15.1 MB in 0s (60.7 MB/s) 
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
 LANGUAGE = (unset),
 LC_ALL = (unset),
 LC_TIME = "th_TH.UTF-8",
 LC_MONETARY = "th_TH.UTF-8",
 LC_ADDRESS = "th_TH.UTF-8",
 LC_TELEPHONE = "th_TH.UTF-8",
 LC_NAME = "th_TH.UTF-8",
 LC_MEASUREMENT = "th_TH.UTF-8",
 LC_IDENTIFICATION = "th_TH.UTF-8",
 LC_NUMERIC = "th_TH.UTF-8",
 LC_PAPER = "th_TH.UTF-8",
 LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_ALL to default locale: No such file or directory
Selecting previously unselected package puppet-agent.
(Reading database ... 57367 files and directories currently installed.)
Preparing to unpack .../puppet-agent_1.6.2-1trusty_amd64.deb ...
Unpacking puppet-agent (1.6.2-1trusty) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up puppet-agent (1.6.2-1trusty) ...
update-rc.d: warning:  start runlevel arguments (none) do not match pxp-agent Default-Start values (2 3 4 5)
update-rc.d: warning:  stop runlevel arguments (none) do not match pxp-agent Default-Stop values (0 1 6)
Processing triggers for ureadahead (0.100.0-16) ...
ubuntu@node1:~$ sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true
2016-09-10 22:54:37.681893 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
Notice: /Service[puppet]/ensure: ensure changed 'stopped' to 'running'
service { 'puppet':
  ensure => 'running',
  enable => 'true',
}
ubuntu@node1:~$ sudo /opt/puppetlabs/bin/puppet agent --test
2016-09-10 22:57:11.396529 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for node1
Info: Applying configuration version '1473523040'
Notice: Applied catalog in 0.24 seconds

Sign certificates on the CA master


nutt@nutt-pc:~/Downloads$ sudo /opt/puppetlabs/bin/puppet cert list
Warning: Facter: Could not process routing table entry: Expected a destination followed by key/value pairs, got '192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 linkdown'
  "node1" (SHA256) CE:54:74:42:95:4A:C5:44:20:90:23:26:3C:63:F0:4D:71:79:12:BC:06:CC:0A:6A:ED:DE:E4:BD:AA:77:C2:A3
nutt@nutt-pc:~/Downloads$ sudo /opt/puppetlabs/bin/puppet cert sign node1
Warning: Facter: Could not process routing table entry: Expected a destination followed by key/value pairs, got '192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1 linkdown'
Signing Certificate Request for:
  "node1" (SHA256) CE:54:74:42:95:4A:C5:44:20:90:23:26:3C:63:F0:4D:71:79:12:BC:06:CC:0A:6A:ED:DE:E4:BD:AA:77:C2:A3
Notice: Signed certificate request for node1
Notice: Removing file Puppet::SSL::CertificateRequest node1 at '/etc/puppetlabs/puppet/ssl/ca/requests/node1.pem'



Master-Agent Simple Setup


Main file /etc/puppetlabs/code/environments/production/manifests/site.pp

All node effects

file {'/tmp/example-ip':                                            # resource type file and filename
  ensure  => present,                                               # make sure it exists
  mode    => '0644',                                                # file permissions
  content => "Here is my Public IP Address: ${ipaddress_eth0}.\n",  # note the ipaddress_eth0 fact
}

Specific node effects

node 'ns1', 'ns2' {    # applies to ns1 and ns2 nodes
  file {'/tmp/dns':    # resource type file and filename
    ensure => present, # make sure it exists
    mode => '0644',
    content => "Only DNS servers get this file.\n",
  }
}

node default {}       # applies to nodes that aren't explicitly defined


Resource propagate to nodes depend on their schedule or take effect them immediately with 'puppet agent --test'
As a result of above configuration, all node will have a file name 'dns' and 'example-ip' in /tmp directory

Puppet Apply (standalone run puppet file)


Optional, set environment variable:  only root user can run Puppet

PATH=/opt/puppetlabs/bin:$PATH;export PATH

docker_example.pp


include 'docker'
docker::run { 'helloworld':
  image   => 'ubuntu:precise',
  command => '/bin/sh -c "while true; do echo hello world; sleep 1; done"',
}




Troubleshooting


Some of client may have an error when run Puppet command like below:


root@node2:~/Docker/puppet# facter
2016-09-11 00:21:25.117804 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
2016-09-11 00:21:25.146927 FATAL puppetlabs.facter - unhandled exception: boost::filesystem::current_path: No such file or directory
root@node2:~/Docker/puppet# puppet agent --test
2016-09-11 00:23:03.006097 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C
terminate called after throwing an instance of 'boost::filesystem::filesystem_error'
  what():  boost::filesystem::current_path: No such file or directory
Aborted (core dumped)

This because server do not have some library such as 'libboost-filesystem-dev'









Monday, September 5, 2016

Apache Nutch




ubuntu@node2:~$ docker exec -it hbase bash
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# useradd nutch -m -s /bin/bash
root@45883500b170:/# passwd nutch
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# 
root@45883500b170:/# exit
exit
ubuntu@node2:~$ docker exec -it --user nutch hbase bash
nutch@45883500b170:/$ 
nutch@45883500b170:/$ 
nutch@45883500b170:/$ pwd          
/
nutch@45883500b170:/$ cd
nutch@45883500b170:~$ pwd
/home/nutch
nutch@45883500b170:~$ tar xzvf /software/apache-nutch-2.3.1-src.tar.gz 
apache-nutch-2.3.1/conf/
apache-nutch-2.3.1/docs/
apache-nutch-2.3.1/docs/api/
apache-nutch-2.3.1/docs/api/org/
apache-nutch-2.3.1/docs/api/org/apache/
apache-nutch-2.3.1/docs/api/org/apache/nutch/
apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/
apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/lang/
apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/lang/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/db/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/db/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/misc/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/misc/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/request/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/request/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/response/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/response/class-use/
apache-nutch-2.3.1/docs/api/org/apache/nutch/api/resources/
...

$NUTCH_HOME/ivy/ivy.xml :

<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" />
    <dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" />


$NUTCH_HOME/conf/gora.properties :

############################
# HBaseStore properties  #
############################
gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
gora.datastore.autocreateschema=true
gora.datastore.scanner.caching=1000
hbase.client.autoflush.default=false


nutch@45883500b170:~/apache-nutch-2.3.1$ ant clean
Buildfile: /home/nutch/apache-nutch-2.3.1/build.xml
Trying to override old definition of task javac
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

clean-build:
   [delete] Deleting directory /home/nutch/apache-nutch-2.3.1/build

clean-lib:

clean-dist:

clean-runtime:

clean:

BUILD SUCCESSFUL
Total time: 0 seconds
nutch@45883500b170:~/apache-nutch-2.3.1$ ant runtime
Buildfile: /home/nutch/apache-nutch-2.3.1/build.xml
Trying to override old definition of task javac
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

ivy-probe-antlib:

ivy-download:
  [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.

ivy-download-unchecked:

ivy-init-antlib:

ivy-init:

init:
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/classes
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/release
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/test
    [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/test/classes

clean-lib:

resolve-default:
[ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file = /home/nutch/apache-nutch-2.3.1/ivy/ivysettings.xml
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/solr/solr-solrj/4.6.0/solr-solrj-4.6.0.jar ...
[ivy:resolve] ...........
[ivy:resolve] .............................
[ivy:resolve] . (393kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.solr#solr-solrj;4.6.0!solr-solrj.jar (4382ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.5.2/hadoop-common-2.5.2.jar ...
[ivy:resolve] .................
[ivy:resolve] ...............................
[ivy:resolve] ................
[ivy:resolve] .......................
[ivy:resolve] ........................
[ivy:resolve] ........................
[ivy:resolve] .........................
[ivy:resolve] .......................
[ivy:resolve] ............................
[ivy:resolve] ......................
[ivy:resolve] ............................
[ivy:resolve] ......................
[ivy:resolve] ............ (2894kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.5.2!hadoop-common.jar (21544ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.5.2/hadoop-hdfs-2.5.2.jar ...
[ivy:resolve] ...................................
[ivy:resolve] ...................................
[ivy:resolve] ......................................
[ivy:resolve] .......................................
[ivy:resolve] ....................................
[ivy:resolve] .......................................
[ivy:resolve] ..........................................
[ivy:resolve] .......................................
[ivy:resolve] ..................................
[ivy:resolve] ........................................
[ivy:resolve] ..................................
[ivy:resolve] .........................................
[ivy:resolve] .............................................
[ivy:resolve] ...................................
[ivy:resolve] ......................
[ivy:resolve] .........................................
[ivy:resolve] ..........................................
[ivy:resolve] .............................................
[ivy:resolve] ........................................
[ivy:resolve] .....................................
[ivy:resolve] ............. (6928kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-hdfs;2.5.2!hadoop-hdfs.jar (33894ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.5.2/hadoop-mapreduce-client-core-2.5.2.jar ...
[ivy:resolve] ....................
[ivy:resolve] .......................
[ivy:resolve] .........................
[ivy:resolve] ..............................
[ivy:resolve] ...............
[ivy:resolve] ...................
[ivy:resolve] ................. (1463kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-core;2.5.2!hadoop-mapreduce-client-core.jar (12531ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.5.2/hadoop-mapreduce-client-jobclient-2.5.2.jar ...
[ivy:resolve] .. (34kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.5.2!hadoop-mapreduce-client-jobclient.jar (1075ms)
[ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet/2.2.3/org.restlet-2.2.3.jar ...
[ivy:resolve] ......................
[ivy:resolve] ..........................
[ivy:resolve] ......................... (670kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.restlet.jse#org.restlet;2.2.3!org.restlet.jar (7877ms)
[ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet.ext.jackson/2.2.3/org.restlet.ext.jackson-2.2.3.jar ...
[ivy:resolve] ... (7kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.restlet.jse#org.restlet.ext.jackson;2.2.3!org.restlet.ext.jackson.jar (2971ms)
[ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet.ext.jaxrs/2.2.3/org.restlet.ext.jaxrs-2.2.3.jar ...
[ivy:resolve] ...................
[ivy:resolve] ............ (305kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] org.restlet.jse#org.restlet.ext.jaxrs;2.2.3!org.restlet.ext.jaxrs.jar (5760ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar ...
[ivy:resolve] ....................... (239kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]  [SUCCESSFUL ] junit#junit;4.11!junit.jar (718ms)
[ivy:resolve] downloading http://repo1.maven.org/maven2/org/hsqldb/hsqldb/2.2.8/hsqldb-2.2.8.jar ...
[ivy:resolve] ............................
[ivy:resolve] .........................
[ivy:resolve] ....................


Configure Nutch

$NUTCH_HOME/runtime/local/conf/nutch-site.xml :

<configuration>
 <property>
    <name>http.agent.name</name>
    <value>Nutty Spider</value>
  </property>
  <property>
    <name>storage.data.store.class</name>
    <value>org.apache.gora.hbase.store.HBaseStore</value>
    <description>Default class for storing data</description>
  </property>
  <property>
    <name>plugin.includes</name>     <value>protocol-httpclient|urlfilter-regex|parse-(text|tika|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|indexer-elastic</value>
  </property>
  <property>
    <name>db.ignore.external.links</name>
    <value>true</value>
  </property>
  <property>
    <name>elastic.host</name>
    <value>10.0.2.41</value>
  </property>
  <property>
    <name>elastic.port</name>
    <value>9300</value>
  </property>
  <property>
    <name>elastic.cluster</name>
    <value>elasticsearch</value>
  </property>
  <property>
    <name>elastic.index</name>
    <value>nutchindex</value>
  </property>
  <property>
    <name>parser.character.encoding.default</name>
    <value>utf-8</value>
  </property>
  <property>
    <name>http.content.limit</name>
    <value>6553600</value>
  </property>
  <property>
  <name>elastic.max.bulk.docs</name>
  <value>250</value>
<description>Maximum size of the bulk in number of documents.</description>
</property>
<property>
  <name>elastic.max.bulk.size</name>
  <value>2500500</value>
  <description>Maximum size of the bulk in bytes.</description>
</property>
</configuration>


Simple test


nutch@45883500b170:~/apache-nutch-2.3.1/runtime/local/bin$ ./nutch inject ~/nutch/testseed 
InjectorJob: starting at 2016-08-30 09:48:49
InjectorJob: Injecting urlDir: /home/nutch/nutch/testseed
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 0
Injector: finished at 2016-08-30 09:48:58, elapsed: 00:00:08


Crawling the web and indexing by Elasticsearch


9300 - Elasticsearch native java port
9200 - RESTful API

nutch@45883500b170:~$ cat seed/urls.txt 
https://en.wikipedia.org
nutch@45883500b170:~$ nutch inject seed/urls.txt 
InjectorJob: starting at 2016-08-30 10:24:37
InjectorJob: Injecting urlDir: seed/urls.txt
InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
InjectorJob: total number of urls rejected by filters: 0
InjectorJob: total number of urls injected after normalization and filtering: 1
Injector: finished at 2016-08-30 10:24:41, elapsed: 00:00:03
nutch@45883500b170:~$ nutch generate -topN 40
GeneratorJob: starting at 2016-08-30 10:25:02
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: normalizing: true
GeneratorJob: topN: 40
GeneratorJob: finished at 2016-08-30 10:25:07, time elapsed: 00:00:04
GeneratorJob: generated batch id: 1472552702-144817008 containing 1 URLs
nutch@45883500b170:~$ nutch fetch -all
FetcherJob: starting at 2016-08-30 10:25:16
FetcherJob: fetching all
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 1 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
fetching https://en.wikipedia.org/ (queue crawl delay=5000ms)
-finishing thread FetcherThread2, activeThreads=8
-finishing thread FetcherThread6, activeThreads=6
-finishing thread FetcherThread5, activeThreads=5
-finishing thread FetcherThread7, activeThreads=7
-finishing thread FetcherThread8, activeThreads=7
-finishing thread FetcherThread4, activeThreads=4
-finishing thread FetcherThread3, activeThreads=3
-finishing thread FetcherThread1, activeThreads=2
-finishing thread FetcherThread9, activeThreads=1
-finishing thread FetcherThread0, activeThreads=0
0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues
-activeThreads=0
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 0 records. Hit by time limit :0
Fetcher: throughput threshold: -1
Fetcher: throughput threshold sequence: 5
-finishing thread FetcherThread9, activeThreads=9
-finishing thread FetcherThread1, activeThreads=8
-finishing thread FetcherThread2, activeThreads=7
-finishing thread FetcherThread0, activeThreads=1
-finishing thread FetcherThread7, activeThreads=2
-finishing thread FetcherThread6, activeThreads=3
-finishing thread FetcherThread5, activeThreads=4
-finishing thread FetcherThread4, activeThreads=5
-finishing thread FetcherThread3, activeThreads=6
-finishing thread FetcherThread8, activeThreads=0
0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues
-activeThreads=0
FetcherJob: finished at 2016-08-30 10:25:32, time elapsed: 00:00:16
nutch@45883500b170:~$