Friday, December 30, 2016
Docker Daemon
The Docker daemon can listen for Docker Remote API requests via three different types of Socket: unix, tcp, and fd
unix :- /var/run/docker.sock (default)
tcp :- for remote access, support tls1.0 i.e.
-H tcp://0.0.0.0:2375 # for all IP address
-H tcp://192.168.59.103:2375 # specific IP address
*** Docker daemon: 2375 (non-secure), 2376 (secure)
fd :- socket activated files i.e.
-H fd://
-H fd://3
Can specific multiple socket like this
dockerd -H unix:///var/run/docker.sock -H tcp://192.168.59.106 -H tcp://10.10.10.2
Thursday, December 22, 2016
How to deploy scrapy project to scrapinghub.com then save crawling data to mongoDB
PIP required module :
pymongo
ubuntu@nutthaphongmail:~$ docker exec -it --user root scrapyd-server bash root@6b82490da131:/home/scrapyd# pip install shub Collecting shub Downloading shub-2.5.0-py2.py3-none-any.whl (47kB) 100% |################################| 51kB 773kB/s Collecting requests (from shub) Downloading requests-2.12.4-py2.py3-none-any.whl (576kB) 100% |################################| 583kB 1.3MB/s Requirement already satisfied: six in /usr/local/lib/python2.7/dist-packages (from shub) Collecting scrapinghub (from shub) Downloading scrapinghub-1.9.0-py2-none-any.whl Requirement already satisfied: pip in /usr/local/lib/python2.7/dist-packages (from shub) Collecting PyYAML (from shub) Downloading PyYAML-3.12.tar.gz (253kB) 100% |################################| 256kB 2.5MB/s Collecting click (from shub) Downloading click-6.6-py2.py3-none-any.whl (71kB) 100% |################################| 71kB 4.2MB/s Collecting retrying (from shub) Downloading retrying-1.3.3.tar.gz Collecting docker-py (from shub) Downloading docker_py-1.10.6-py2.py3-none-any.whl (50kB) 100% |################################| 51kB 4.2MB/s Collecting backports.ssl-match-hostname>=3.5; python_version < "3.5" (from docker-py->shub) Downloading backports.ssl_match_hostname-3.5.0.1.tar.gz Collecting websocket-client>=0.32.0 (from docker-py->shub) Downloading websocket_client-0.40.0.tar.gz (196kB) 100% |################################| 204kB 2.7MB/s Requirement already satisfied: ipaddress>=1.0.16; python_version < "3.3" in /usr/local/lib/python2.7/dist-packages (from docker-py->shub) Collecting docker-pycreds>=0.2.1 (from docker-py->shub) Downloading docker_pycreds-0.2.1-py2.py3-none-any.whl Building wheels for collected packages: PyYAML, retrying, backports.ssl-match-hostname, websocket-client Running setup.py bdist_wheel for PyYAML ... done Stored in directory: /root/.cache/pip/wheels/2c/f7/79/13f3a12cd723892437c0cfbde1230ab4d82947ff7b3839a4fc Running setup.py bdist_wheel for retrying ... done Stored in directory: /root/.cache/pip/wheels/d9/08/aa/49f7c109140006ea08a7657640aee3feafb65005bcd5280679 Running setup.py bdist_wheel for backports.ssl-match-hostname ... done Stored in directory: /root/.cache/pip/wheels/5d/72/36/b2a31507b613967b728edc33378a5ff2ada0f62855b93c5ae1 Running setup.py bdist_wheel for websocket-client ... done Stored in directory: /root/.cache/pip/wheels/d1/5e/dd/93da015a0ecc8375278b05ad7f0452eff574a044bcea2a95d2 Successfully built PyYAML retrying backports.ssl-match-hostname websocket-client Installing collected packages: requests, retrying, scrapinghub, PyYAML, click, backports.ssl-match-hostname, websocket-client, docker-pycreds, docker-py, shub Successfully installed PyYAML-3.12 backports.ssl-match-hostname-3.5.0.1 click-6.6 docker-py-1.10.6 docker-pycreds-0.2.1 requests-2.12.4 retrying-1.3.3 scrapinghub-1.9.0 shub-2.5.0 websocket-client-0.40.0 root@6b82490da131:/home/scrapyd# exit exit ubuntu@nutthaphongmail:~$ docker exec -it scrapyd-server bash scrapyd@6b82490da131:~$ scrapyd@6b82490da131:~$ shub Usage: shub [OPTIONS] COMMAND [ARGS]... shub is the Scrapinghub command-line client. It allows you to deploy projects or dependencies, schedule spiders, and retrieve scraped data or logs without leaving the command line. Options: --version Show the version and exit. --help Show this message and exit. Commands: copy-eggs Sync eggs from one project with other project deploy Deploy Scrapy project to Scrapy Cloud deploy-egg [DEPRECATED] Build and deploy egg from source deploy-reqs [DEPRECATED] Build and deploy eggs from requirements.txt fetch-eggs Download project eggs from Scrapy Cloud image Manage project based on custom Docker image items Fetch items from Scrapy Cloud log Fetch log from Scrapy Cloud login Save your Scrapinghub API key logout Forget saved Scrapinghub API key migrate-eggs Migrate dash eggs to requirements.txt and project's directory requests Fetch requests from Scrapy Cloud schedule Schedule a spider to run on Scrapy Cloud version Show shub version For usage and help on a specific command, run it with a --help flag, e.g.: shub schedule --help scrapyd@6b82490da131:~$ scrapyd@6b82490da131:~$ scrapyd@6b82490da131:~$ pwd /home/scrapyd
scrapyd@6b82490da131:~/projects/PyLearning$ cd stack/ scrapyd@6b82490da131:~/projects/PyLearning/stack$ cat requirements.txt pymongo==3.4.0 scrapyd@6b82490da131:~/projects/PyLearning/stack$ scrapyd@6b82490da131:~/projects/PyLearning/stack$ scrapyd@6b82490da131:~/projects/PyLearning/stack$ scrapyd@6b82490da131:~/projects/PyLearning/stack$ cat scrapinghub.yml projects: default: 136494 requirements_file: requirements.txt scrapyd@6b82490da131:~/projects/PyLearning/stack$scrapyd@6b82490da131:~/projects/PyLearning/stack$ shub login ------------------------------------------------------------------------------- Welcome to shub version 2! This release contains major updates to how shub is configured, as well as updates to the commands and shub's look & feel. Run 'shub' to get an overview over all available commands, and 'shub command --help' to get detailed help on a command. Definitely try the new 'shub items -f [JOBID]' to see items live as they are being scraped! From now on, shub configuration should be done in a file called 'scrapinghub.yml', living next to the previously used 'scrapy.cfg' in your Scrapy project directory. Global configuration, for example API keys, should be done in a file called '.scrapinghub.yml' in your home directory. But no worries, shub has automatically migrated your global settings to ~/.scrapinghub.yml, and will also automatically migrate your project settings when you run a command within a Scrapy project. Visit http://doc.scrapinghub.com/shub.html for more information on the new configuration format and its benefits. Happy scraping! ------------------------------------------------------------------------------- Enter your API key from https://app.scrapinghub.com/account/apikey API key: e4bfa1fd7f8d4d9da817aa112bb82095 Validating API key... API key is OK, you are logged in now. scrapyd@6b82490da131:~/projects/PyLearning/stack$ scrapyd@6b82490da131:~/projects/PyLearning/stack$ shub deploy Target project ID: 136494 Save as default [Y/n]: Y Project 136494 was set as default in scrapinghub.yml. You can deploy to it via 'shub deploy' from now on. Packing version e7d7f6c-inet1 Deploying to Scrapy Cloud project "136494" {"spiders": 1, "status": "ok", "project": 136494, "version": "e7d7f6c-inet1"} Run your spiders at: https://app.scrapinghub.com/p/136494/
* API key from https://app.scrapinghub.com/account/apikey
** project ID found on https://app.scrapinghub.com/p/PROJECT_ID/deploy for new project
Tuesday, November 22, 2016
Deploy dataset project to scrapyd
scrapyd@a4c2642d74db:~/projects$ git clone https://github.com/nutthaphon/PyLearning.git Cloning into 'PyLearning'... remote: Counting objects: 323, done. remote: Compressing objects: 100% (6/6), done. remote: Total 323 (delta 0), reused 0 (delta 0), pack-reused 317 Receiving objects: 100% (323/323), 61.27 KiB | 0 bytes/s, done. Resolving deltas: 100% (146/146), done. Checking connectivity... done. scrapyd@a4c2642d74db:~/projects$ scrapyd@a4c2642d74db:~/projects$ scrapyd@a4c2642d74db:~/projects$ scrapyd@a4c2642d74db:~/projects$ cd PyLearning/ scrapyd@a4c2642d74db:~/projects/PyLearning$ ls Animals CherryPy DJango README.md Scraping dataset decor foo serial test tutorial scrapyd@a4c2642d74db:~/projects/PyLearning$ cd dataset/ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ls __init__.py __init__.pyc build project.egg-info scrapinghub.yml scrapy.cfg settrade setup.py scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ sed -i 's/#url/url/g' scrapy.cfg scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ cat scrapy.cfg # Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # https://scrapyd.readthedocs.org/en/latest/deploy.html [settings] default = settrade.settings [deploy] url = http://localhost:6800/ project = settrade scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ls __init__.py __init__.pyc build project.egg-info scrapinghub.yml scrapy.cfg settrade setup.py scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ cd settrade/ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade$ ls DailyStockQuote.py QueryStockSymbol.py ThinkSpeakChannels.py __init__.py items.py scrapinghub.yml settings.pyc DailyStockQuote.pyc QueryStockSymbol.pyc ThinkSpeakChannels.pyc __init__.pyc pipelines.py settings.py spiders scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade$ cd spiders/ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ ls SettradeSpider.py SettradeSpider.pyc __init__.py __init__.pyc scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ ls SettradeSpider.py SettradeSpider.pyc __init__.py __init__.pyc scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade/spiders$ cd .. scrapyd@a4c2642d74db:~/projects/PyLearning/dataset/settrade$ cd .. scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ls __init__.py __init__.pyc build project.egg-info scrapinghub.yml scrapy.cfg settrade setup.py scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -l default http://localhost:6800/ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy default -p dataset Packing version 1479804720 Deploying to project "dataset" in http://localhost:6800/addversion.json Server response (200): {"status": "ok", "project": "dataset", "version": "1479804720", "spiders": 1, "node_name": "a4c2642d74db"} scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -L default tutorial dataset scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ curl http://localhost:6800/listprojects.json {"status": "ok", "projects": ["tutorial", "dataset"], "node_name": "a4c2642d74db"} scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ curl http://localhost:6800/listspiders.json?project=dataset {"status": "ok", "spiders": ["settrade_dataset"], "node_name": "a4c2642d74db"} scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ scrapyd@a4c2642d74db:~/projects/PyLearning/dataset$ curl http://localhost:6800/schedule.json -d project=dataset -d spider=settrade_dataset -d setting=DOWNLOAD_DELAY=5 -d start_urls=http://www.settrade.com/servlet/IntradayStockChartDataServlet?symbol=INTUCH,http://www.settrade.com/servlet/IntradayStockChartDataServlet?symbol=CPF,http://www.settrade.com/servlet/IntradayStockChartDataServlet?symbol=ADVANC {"status": "ok", "jobid": "c60e4566b09111e68c380242ac110002", "node_name": "a4c2642d74db"}
Scrapyd deploying your project
ubuntu@node2:~/Docker/nutthaphon/scrapyd/user$ docker run --name scrapyd-server --user scrapyd -it -P nutthaphon/scrapyd:1.1.1 bash scrapyd@a4c2642d74db:~$ pwd /home/scrapyd scrapyd@a4c2642d74db:~$ ls master.zip scrapyd-client-master setuptools-28.8.0.zip scrapyd@a4c2642d74db:~$ mkdir projects scrapyd@a4c2642d74db:~$ cd projects/ scrapyd@a4c2642d74db:~/projects$ scrapy startproject tutorial New Scrapy project 'tutorial', using template directory '/usr/local/lib/python2.7/dist-packages/scrapy/templates/project', created in: /home/scrapyd/projects/tutorial You can start your first spider with: cd tutorial scrapy genspider example example.com scrapyd@a4c2642d74db:~/projects$ cd tutorial/ scrapyd@a4c2642d74db:~/projects/tutorial$ ls scrapy.cfg tutorial scrapyd@a4c2642d74db:~/projects/tutorial$ vi scrapy.cfg bash: vi: command not found scrapyd@a4c2642d74db:~/projects/tutorial$ cat scrapy.cfg # Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # https://scrapyd.readthedocs.org/en/latest/deploy.html [settings] default = tutorial.settings [deploy] #url = http://localhost:6800/ project = tutorial scrapyd@a4c2642d74db:~/projects/tutorial$ sed -i 's/#utl/url/g' scrapy.cfg scrapyd@a4c2642d74db:~/projects/tutorial$ cat scrapy.cfg # Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # https://scrapyd.readthedocs.org/en/latest/deploy.html [settings] default = tutorial.settings [deploy] #url = http://localhost:6800/ project = tutorial scrapyd@a4c2642d74db:~/projects/tutorial$ sed -i 's/#url/url/g' scrapy.cfg scrapyd@a4c2642d74db:~/projects/tutorial$ cat scrapy.cfg # Automatically created by: scrapy startproject # # For more information about the [deploy] section see: # https://scrapyd.readthedocs.org/en/latest/deploy.html [settings] default = tutorial.settings [deploy] url = http://localhost:6800/ project = tutorial scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -l default http://localhost:6800/ scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy default -p tutorial Packing version 1479799362 Deploying to project "tutorial" in http://localhost:6800/addversion.json Deploy failed: <urlopen error [Errno 111] Connection refused> scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy default -p tutorial Packing version 1479799403 Deploying to project "tutorial" in http://localhost:6800/addversion.json Server response (200): {"status": "ok", "project": "tutorial", "version": "1479799403", "spiders": 0, "node_name": "a4c2642d74db"} scrapyd@a4c2642d74db:~/projects/tutorial$ ~/scrapyd-client-master/scrapyd-client/scrapyd-deploy -L default tutorial
Wednesday, October 26, 2016
Wildfly and JSF
Checking existing JSF subsystem and default module
[jboss@69c9222702f3 bin]$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect 10.80.1.209:9990 Authenticating against security realm: ManagementRealm Username: admin Password: [standalone@10.80.1.209:9990 /] /subsystem=jsf/:list-active-jsf-impls { "outcome" => "success", "result" => ["main"] } [standalone@10.80.1.209:9990 /] /subsystem=jsf/:read-attribute(name=default-jsf-impl-slot) { "outcome" => "success", "result" => "main" } [standalone@10.80.1.209:9990 /]
Deploy new one v1.2 and restart Wildfly
[standalone@10.80.1.209:9990 /] deploy /install-mojarra-1.2_15.cli
-or- Download files from github
Verify installed module ,change default to new added and restart Wildfly
[jboss@69c9222702f3 bin]$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect 10.80.1.209:9990 Authenticating against security realm: ManagementRealm Username: admin Password: [standalone@10.80.1.209:9990 /] /subsystem=jsf/:list-active-jsf-impls { "outcome" => "success", "result" => [ "mojarra-1.2_15", "main" ] } [standalone@10.80.1.209:9990 /] /subsystem=jsf/:read-attribute(name=default-jsf-impl-slot) { "outcome" => "success", "result" => "main" } [standalone@10.80.1.209:9990 /] /subsystem=jsf/:write-attribute(name=default-jsf-impl-slot,value=mojarra-1.2_15) { "outcome" => "success", "response-headers" => { "operation-requires-reload" => true, "process-state" => "reload-required" } } [standalone@10.80.1.209:9990 /]
Confirm default to JSF 1.2
[jboss@69c9222702f3 bin]$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect 10.80.1.209:9990 Authenticating against security realm: ManagementRealm Username: admin Password: [standalone@10.80.1.209:9990 /] /subsystem=jsf/:read-attribute(name=default-jsf-impl-slot) { "outcome" => "success", "result" => "mojarra-1.2_15" }
Test JSF 1.2 Web Application
Reference Site : Steps to add any new JSF implementation or version to WildFly
Tuesday, October 11, 2016
JMeter on containers
nutt@nutt-pc:~$ docker network create --driver=bridge --subnet=172.27.0.0/16 \ > --ip-range=172.27.5.0/24 \ > --gateway=172.27.5.254 \ > my-net 1721e834110c184f92ce2ae1008bf5f5dcd735f38a7d35f6bbcc9eebc4d5de8e nutt@nutt-pc:~$ docker network inspect my-net [ { "Name": "my-net", "Id": "1721e834110c184f92ce2ae1008bf5f5dcd735f38a7d35f6bbcc9eebc4d5de8e", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.27.0.0/16", "IPRange": "172.27.5.0/24", "Gateway": "172.27.5.254" } ] }, "Internal": false, "Containers": {}, "Options": {}, "Labels": {} } ]
nutt@nutt-pc:~$ docker run -it --net="my-net" --add-host="jmeter-server:172.27.5.1" --mac-address="9a:17:a0:c7:b4:cb" --ip="172.27.5.1" --name jmeter-server1 ubuntu bash root@c07696364171:/# root@c07696364171:/# root@c07696364171:/# root@c07696364171:/# root@c07696364171:/# apt-get update Get:1 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB] Get:2 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [95.7 kB] Get:3 http://archive.ubuntu.com/ubuntu xenial-security InRelease [94.5 kB] Get:4 http://archive.ubuntu.com/ubuntu xenial/main Sources [1103 kB] Get:5 http://archive.ubuntu.com/ubuntu xenial/restricted Sources [5179 B] Get:6 http://archive.ubuntu.com/ubuntu xenial/universe Sources [9802 kB] Get:7 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB] Get:8 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB] Get:9 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB] Get:10 http://archive.ubuntu.com/ubuntu xenial-updates/main Sources [244 kB] Get:11 http://archive.ubuntu.com/ubuntu xenial-updates/universe Sources [124 kB] Get:12 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [506 kB] Get:13 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [418 kB] Get:14 http://archive.ubuntu.com/ubuntu xenial-security/main Sources [51.4 kB] Get:15 http://archive.ubuntu.com/ubuntu xenial-security/universe Sources [11.1 kB] Get:16 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages [191 kB] Get:17 http://archive.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [52.5 kB] Fetched 24.3 MB in 11s (2109 kB/s) Reading package lists... Done
Create DOCKERFILE
Monday, September 19, 2016
Python Modules and Packages
Working directory
PyLearning ├── Animals ├── foo ├── README.md └── test
foo/ ├── bar.py ├── fibo.py ├── fibo.pyc ├── __init__.py └── __init__.pyc
test/ ├── Backwards.py ├── Backwards.pyc ├── callFibo.py ├── callFibo.pyc ├── Card.py ├── Foo.py ├── Foo.pyc ├── __init__.py ├── __init__.pyc ├── mystuff.py ├── mystuff.pyc └── support.py
fibo.py
def fib(n): # write Fibonacci series up to n a, b = 0, 1 while b < n: print b, a, b = b, a+b def fib2(n): # return Fibonacci series up to n result = [] a, b = 0, 1 while b < n: result.append(b) a, b = b, a+b print result def fib3(n): # write Fibonacci series up to n a, b = 0, 1 while b < n: print b, a, b = b, a+b
callFibo.py
from foo.fibo import fib, fib2, fib3 print('Call to fib3()') fib3(100) print('\n') print('Call to fib2()') fib2(100) print('\n') print('Call to fib()') fib(1000) print('\n') print('Call to instance of fib()') fib=fib fib(500)
Output
Call to fib3() 1 1 2 3 5 8 13 21 34 55 89 Call to fib2() [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89] Call to fib() 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 Call to instance of fib() 1 1 2 3 5 8 13 21 34 55 89 144 233 377
Wednesday, September 14, 2016
Ubuntu Port forwarding
Port forwarding
sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 9000 -j DNAT --to-destination 10.0.2.129:9000 sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 9090 -j DNAT --to-destination 10.0.2.129:9090 sudo iptables -t nat -I PREROUTING -p tcp -d 192.168.1.106 --dport 50070 -j DNAT --to-destination 10.0.2.129:50070 sudo iptables -I FORWARD -m state -d 10.0.2.0/24 --state NEW,RELATED,ESTABLISHED -j ACCEPT
Saturday, September 10, 2016
Puppet
Puppet has 2 distributes :
1. WEBrick Puppet (Apache) - Naming of services like puppetmaster and puppetagent etc.
2. Puppet Labs (Use in this tutorial)
Puppet Server: Installing From Packages
Puppet Collections and packages
$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-xenial.deb$ sudo dpkg -i puppetlabs-release-pc1-xenial.deb
$ sudo apt update
$ sudo apt-get install puppetserver
$ sudo systemctl start puppetserver
Puppet agent: Linux
Puppet Collections and packages
$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb$ sudo dpkg -i puppetlabs-release-pc1-trusty.deb
$ sudo apt-get update
** Before startup install & configure agent do not forget add Puppet Server(Master) in /etc/hosts at agent side. Default name of Puppet Server is puppet so you map that name to correct IP address.
ubuntu@node1:~$ sudo vi /etc/hosts ubuntu@node1:~$ ubuntu@node1:~$ wget https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb --2016-09-10 22:52:47-- https://apt.puppetlabs.com/puppetlabs-release-pc1-trusty.deb Resolving apt.puppetlabs.com (apt.puppetlabs.com)... 198.58.114.168, 2600:3c00::f03c:91ff:fe69:6bf0 Connecting to apt.puppetlabs.com (apt.puppetlabs.com)|198.58.114.168|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 13652 (13K) [application/x-debian-package] Saving to: 'puppetlabs-release-pc1-trusty.deb' 100%[================================================================================================================>] 13,652 44.3KB/s in 0.3s 2016-09-10 22:52:49 (44.3 KB/s) - 'puppetlabs-release-pc1-trusty.deb' saved [13652/13652] ubuntu@node1:~$ sudo dpkg -i puppetlabs-release-pc1-trusty.deb Selecting previously unselected package puppetlabs-release-pc1. (Reading database ... 57362 files and directories currently installed.) Preparing to unpack puppetlabs-release-pc1-trusty.deb ... Unpacking puppetlabs-release-pc1 (1.1.0-2trusty) ... Setting up puppetlabs-release-pc1 (1.1.0-2trusty) ... ubuntu@node1:~$ sudo apt-get update Hit https://apt.dockerproject.org ubuntu-trusty InRelease Ign http://apt.puppetlabs.com trusty InRelease Hit https://apt.dockerproject.org ubuntu-trusty/main amd64 Packages Ign http://archive.ubuntu.com trusty InRelease Get:1 https://apt.dockerproject.org ubuntu-trusty/main Translation-en Get:2 http://apt.puppetlabs.com trusty Release.gpg [841 B] Ign https://apt.dockerproject.org ubuntu-trusty/main Translation-en Get:3 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB] Get:4 http://apt.puppetlabs.com trusty Release [54.2 kB] Get:5 http://archive.ubuntu.com trusty-security InRelease [65.9 kB] Get:6 http://apt.puppetlabs.com trusty/PC1 amd64 Packages [15.6 kB] Hit http://archive.ubuntu.com trusty Release.gpg Get:7 http://archive.ubuntu.com trusty-updates/main amd64 Packages [889 kB] Get:8 http://archive.ubuntu.com trusty-updates/restricted amd64 Packages [15.9 kB] Get:9 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [373 kB] Get:10 http://archive.ubuntu.com trusty-updates/multiverse amd64 Packages [14.8 kB] Ign http://apt.puppetlabs.com trusty/PC1 Translation-en Get:11 http://archive.ubuntu.com trusty-updates/main Translation-en [431 kB] Get:12 http://archive.ubuntu.com trusty-updates/multiverse Translation-en [7661 B] Get:13 http://archive.ubuntu.com trusty-updates/restricted Translation-en [3699 B] Get:14 http://archive.ubuntu.com trusty-updates/universe Translation-en [197 kB] Get:15 http://archive.ubuntu.com trusty-security/main amd64 Packages [524 kB] Get:16 http://archive.ubuntu.com trusty-security/restricted amd64 Packages [13.0 kB] Get:17 http://archive.ubuntu.com trusty-security/universe amd64 Packages [136 kB] Get:18 http://archive.ubuntu.com trusty-security/multiverse amd64 Packages [4990 B] Get:19 http://archive.ubuntu.com trusty-security/main Translation-en [288 kB] Get:20 http://archive.ubuntu.com trusty-security/multiverse Translation-en [2570 B] Get:21 http://archive.ubuntu.com trusty-security/restricted Translation-en [3206 B] Get:22 http://archive.ubuntu.com trusty-security/universe Translation-en [81.3 kB] Hit http://archive.ubuntu.com trusty Release Hit http://archive.ubuntu.com trusty/main amd64 Packages Hit http://archive.ubuntu.com trusty/restricted amd64 Packages Hit http://archive.ubuntu.com trusty/universe amd64 Packages Hit http://archive.ubuntu.com trusty/multiverse amd64 Packages Hit http://archive.ubuntu.com trusty/main Translation-en Hit http://archive.ubuntu.com trusty/multiverse Translation-en Hit http://archive.ubuntu.com trusty/restricted Translation-en Hit http://archive.ubuntu.com trusty/universe Translation-en Fetched 3187 kB in 18s (171 kB/s) Reading package lists... Done ubuntu@node1:~$ sudo apt-get install puppet-agent Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: puppet-agent 0 upgraded, 1 newly installed, 0 to remove and 12 not upgraded. Need to get 15.1 MB of archives. After this operation, 81.8 MB of additional disk space will be used. Get:1 http://apt.puppetlabs.com/ trusty/PC1 puppet-agent amd64 1.6.2-1trusty [15.1 MB] Fetched 15.1 MB in 0s (60.7 MB/s) perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_TIME = "th_TH.UTF-8", LC_MONETARY = "th_TH.UTF-8", LC_ADDRESS = "th_TH.UTF-8", LC_TELEPHONE = "th_TH.UTF-8", LC_NAME = "th_TH.UTF-8", LC_MEASUREMENT = "th_TH.UTF-8", LC_IDENTIFICATION = "th_TH.UTF-8", LC_NUMERIC = "th_TH.UTF-8", LC_PAPER = "th_TH.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). locale: Cannot set LC_ALL to default locale: No such file or directory Selecting previously unselected package puppet-agent. (Reading database ... 57367 files and directories currently installed.) Preparing to unpack .../puppet-agent_1.6.2-1trusty_amd64.deb ... Unpacking puppet-agent (1.6.2-1trusty) ... Processing triggers for ureadahead (0.100.0-16) ... Setting up puppet-agent (1.6.2-1trusty) ... update-rc.d: warning: start runlevel arguments (none) do not match pxp-agent Default-Start values (2 3 4 5) update-rc.d: warning: stop runlevel arguments (none) do not match pxp-agent Default-Stop values (0 1 6) Processing triggers for ureadahead (0.100.0-16) ... ubuntu@node1:~$ sudo /opt/puppetlabs/bin/puppet resource service puppet ensure=running enable=true 2016-09-10 22:54:37.681893 WARN puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C Notice: /Service[puppet]/ensure: ensure changed 'stopped' to 'running' service { 'puppet': ensure => 'running', enable => 'true', } ubuntu@node1:~$ sudo /opt/puppetlabs/bin/puppet agent --test 2016-09-10 22:57:11.396529 WARN puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Caching catalog for node1 Info: Applying configuration version '1473523040' Notice: Applied catalog in 0.24 seconds
Sign certificates on the CA master
nutt@nutt-pc:~/Downloads$ sudo /opt/puppetlabs/bin/puppet cert list Warning: Facter: Could not process routing table entry: Expected a destination followed by key/value pairs, got '192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown' "node1" (SHA256) CE:54:74:42:95:4A:C5:44:20:90:23:26:3C:63:F0:4D:71:79:12:BC:06:CC:0A:6A:ED:DE:E4:BD:AA:77:C2:A3 nutt@nutt-pc:~/Downloads$ sudo /opt/puppetlabs/bin/puppet cert sign node1 Warning: Facter: Could not process routing table entry: Expected a destination followed by key/value pairs, got '192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown' Signing Certificate Request for: "node1" (SHA256) CE:54:74:42:95:4A:C5:44:20:90:23:26:3C:63:F0:4D:71:79:12:BC:06:CC:0A:6A:ED:DE:E4:BD:AA:77:C2:A3 Notice: Signed certificate request for node1 Notice: Removing file Puppet::SSL::CertificateRequest node1 at '/etc/puppetlabs/puppet/ssl/ca/requests/node1.pem'
Master-Agent Simple Setup
Main file /etc/puppetlabs/code/environments/production/manifests/site.pp
All node effects
file {'/tmp/example-ip': # resource type file and filename ensure => present, # make sure it exists mode => '0644', # file permissions content => "Here is my Public IP Address: ${ipaddress_eth0}.\n", # note the ipaddress_eth0 fact }
Specific node effects
node 'ns1', 'ns2' { # applies to ns1 and ns2 nodes file {'/tmp/dns': # resource type file and filename ensure => present, # make sure it exists mode => '0644', content => "Only DNS servers get this file.\n", } } node default {} # applies to nodes that aren't explicitly defined
Resource propagate to nodes depend on their schedule or take effect them immediately with 'puppet agent --test'
As a result of above configuration, all node will have a file name 'dns' and 'example-ip' in /tmp directory
Puppet Apply (standalone run puppet file)
Optional, set environment variable: only root user can run Puppet
PATH=/opt/puppetlabs/bin:$PATH;export PATHdocker_example.pp
include 'docker' docker::run { 'helloworld': image => 'ubuntu:precise', command => '/bin/sh -c "while true; do echo hello world; sleep 1; done"', }
Troubleshooting
Some of client may have an error when run Puppet command like below:
root@node2:~/Docker/puppet# facter 2016-09-11 00:21:25.117804 WARN puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C 2016-09-11 00:21:25.146927 FATAL puppetlabs.facter - unhandled exception: boost::filesystem::current_path: No such file or directory root@node2:~/Docker/puppet# puppet agent --test 2016-09-11 00:23:03.006097 WARN puppetlabs.facter - locale environment variables were bad; continuing with LANG=C LC_ALL=C terminate called after throwing an instance of 'boost::filesystem::filesystem_error' what(): boost::filesystem::current_path: No such file or directory Aborted (core dumped)
This because server do not have some library such as 'libboost-filesystem-dev'
Monday, September 5, 2016
Apache Nutch
ubuntu@node2:~$ docker exec -it hbase bash root@45883500b170:/# root@45883500b170:/# root@45883500b170:/# root@45883500b170:/# useradd nutch -m -s /bin/bash root@45883500b170:/# passwd nutch Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully root@45883500b170:/# root@45883500b170:/# root@45883500b170:/# root@45883500b170:/# exit exit ubuntu@node2:~$ docker exec -it --user nutch hbase bash nutch@45883500b170:/$ nutch@45883500b170:/$ nutch@45883500b170:/$ pwd / nutch@45883500b170:/$ cd nutch@45883500b170:~$ pwd /home/nutch nutch@45883500b170:~$ tar xzvf /software/apache-nutch-2.3.1-src.tar.gz apache-nutch-2.3.1/conf/ apache-nutch-2.3.1/docs/ apache-nutch-2.3.1/docs/api/ apache-nutch-2.3.1/docs/api/org/ apache-nutch-2.3.1/docs/api/org/apache/ apache-nutch-2.3.1/docs/api/org/apache/nutch/ apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/ apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/lang/ apache-nutch-2.3.1/docs/api/org/apache/nutch/analysis/lang/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/db/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/impl/db/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/misc/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/misc/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/request/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/request/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/response/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/model/response/class-use/ apache-nutch-2.3.1/docs/api/org/apache/nutch/api/resources/
...
$NUTCH_HOME/ivy/ivy.xml :
<dependency org="org.apache.gora" name="gora-hbase" rev="0.6.1" conf="*->default" /> <dependency org="org.apache.hbase" name="hbase-common" rev="0.98.8-hadoop2" conf="*->default" />
$NUTCH_HOME/conf/gora.properties :
############################ # HBaseStore properties # ############################ gora.datastore.default=org.apache.gora.hbase.store.HBaseStore gora.datastore.autocreateschema=true gora.datastore.scanner.caching=1000 hbase.client.autoflush.default=false
nutch@45883500b170:~/apache-nutch-2.3.1$ ant clean Buildfile: /home/nutch/apache-nutch-2.3.1/build.xml Trying to override old definition of task javac [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found. clean-build: [delete] Deleting directory /home/nutch/apache-nutch-2.3.1/build clean-lib: clean-dist: clean-runtime: clean: BUILD SUCCESSFUL Total time: 0 seconds nutch@45883500b170:~/apache-nutch-2.3.1$ ant runtime Buildfile: /home/nutch/apache-nutch-2.3.1/build.xml Trying to override old definition of task javac [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found. ivy-probe-antlib: ivy-download: [taskdef] Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found. ivy-download-unchecked: ivy-init-antlib: ivy-init: init: [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/classes [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/release [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/test [mkdir] Created dir: /home/nutch/apache-nutch-2.3.1/build/test/classes clean-lib: resolve-default: [ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: loading settings :: file = /home/nutch/apache-nutch-2.3.1/ivy/ivysettings.xml [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/solr/solr-solrj/4.6.0/solr-solrj-4.6.0.jar ... [ivy:resolve] ........... [ivy:resolve] ............................. [ivy:resolve] . (393kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.solr#solr-solrj;4.6.0!solr-solrj.jar (4382ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/2.5.2/hadoop-common-2.5.2.jar ... [ivy:resolve] ................. [ivy:resolve] ............................... [ivy:resolve] ................ [ivy:resolve] ....................... [ivy:resolve] ........................ [ivy:resolve] ........................ [ivy:resolve] ......................... [ivy:resolve] ....................... [ivy:resolve] ............................ [ivy:resolve] ...................... [ivy:resolve] ............................ [ivy:resolve] ...................... [ivy:resolve] ............ (2894kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-common;2.5.2!hadoop-common.jar (21544ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/2.5.2/hadoop-hdfs-2.5.2.jar ... [ivy:resolve] ................................... [ivy:resolve] ................................... [ivy:resolve] ...................................... [ivy:resolve] ....................................... [ivy:resolve] .................................... [ivy:resolve] ....................................... [ivy:resolve] .......................................... [ivy:resolve] ....................................... [ivy:resolve] .................................. [ivy:resolve] ........................................ [ivy:resolve] .................................. [ivy:resolve] ......................................... [ivy:resolve] ............................................. [ivy:resolve] ................................... [ivy:resolve] ...................... [ivy:resolve] ......................................... [ivy:resolve] .......................................... [ivy:resolve] ............................................. [ivy:resolve] ........................................ [ivy:resolve] ..................................... [ivy:resolve] ............. (6928kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-hdfs;2.5.2!hadoop-hdfs.jar (33894ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.5.2/hadoop-mapreduce-client-core-2.5.2.jar ... [ivy:resolve] .................... [ivy:resolve] ....................... [ivy:resolve] ......................... [ivy:resolve] .............................. [ivy:resolve] ............... [ivy:resolve] ................... [ivy:resolve] ................. (1463kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-core;2.5.2!hadoop-mapreduce-client-core.jar (12531ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-jobclient/2.5.2/hadoop-mapreduce-client-jobclient-2.5.2.jar ... [ivy:resolve] .. (34kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.5.2!hadoop-mapreduce-client-jobclient.jar (1075ms) [ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet/2.2.3/org.restlet-2.2.3.jar ... [ivy:resolve] ...................... [ivy:resolve] .......................... [ivy:resolve] ......................... (670kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.restlet.jse#org.restlet;2.2.3!org.restlet.jar (7877ms) [ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet.ext.jackson/2.2.3/org.restlet.ext.jackson-2.2.3.jar ... [ivy:resolve] ... (7kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.restlet.jse#org.restlet.ext.jackson;2.2.3!org.restlet.ext.jackson.jar (2971ms) [ivy:resolve] downloading http://maven.restlet.org/org/restlet/jse/org.restlet.ext.jaxrs/2.2.3/org.restlet.ext.jaxrs-2.2.3.jar ... [ivy:resolve] ................... [ivy:resolve] ............ (305kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] org.restlet.jse#org.restlet.ext.jaxrs;2.2.3!org.restlet.ext.jaxrs.jar (5760ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/junit/junit/4.11/junit-4.11.jar ... [ivy:resolve] ....................... (239kB) [ivy:resolve] .. (0kB) [ivy:resolve] [SUCCESSFUL ] junit#junit;4.11!junit.jar (718ms) [ivy:resolve] downloading http://repo1.maven.org/maven2/org/hsqldb/hsqldb/2.2.8/hsqldb-2.2.8.jar ... [ivy:resolve] ............................ [ivy:resolve] ......................... [ivy:resolve] ....................
Configure Nutch
$NUTCH_HOME/runtime/local/conf/nutch-site.xml :
<configuration> <property> <name>http.agent.name</name> <value>Nutty Spider</value> </property> <property> <name>storage.data.store.class</name> <value>org.apache.gora.hbase.store.HBaseStore</value> <description>Default class for storing data</description> </property> <property> <name>plugin.includes</name> <value>protocol-httpclient|urlfilter-regex|parse-(text|tika|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|indexer-elastic</value> </property> <property> <name>db.ignore.external.links</name> <value>true</value> </property> <property> <name>elastic.host</name> <value>10.0.2.41</value> </property> <property> <name>elastic.port</name> <value>9300</value> </property> <property> <name>elastic.cluster</name> <value>elasticsearch</value> </property> <property> <name>elastic.index</name> <value>nutchindex</value> </property> <property> <name>parser.character.encoding.default</name> <value>utf-8</value> </property> <property> <name>http.content.limit</name> <value>6553600</value> </property> <property> <name>elastic.max.bulk.docs</name> <value>250</value> <description>Maximum size of the bulk in number of documents.</description> </property> <property> <name>elastic.max.bulk.size</name> <value>2500500</value> <description>Maximum size of the bulk in bytes.</description> </property> </configuration>
Simple test
nutch@45883500b170:~/apache-nutch-2.3.1/runtime/local/bin$ ./nutch inject ~/nutch/testseed InjectorJob: starting at 2016-08-30 09:48:49 InjectorJob: Injecting urlDir: /home/nutch/nutch/testseed InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class. InjectorJob: total number of urls rejected by filters: 0 InjectorJob: total number of urls injected after normalization and filtering: 0 Injector: finished at 2016-08-30 09:48:58, elapsed: 00:00:08
Crawling the web and indexing by Elasticsearch
9300 - Elasticsearch native java port
9200 - RESTful API
nutch@45883500b170:~$ cat seed/urls.txt
https://en.wikipedia.org
nutch@45883500b170:~$ nutch inject seed/urls.txt InjectorJob: starting at 2016-08-30 10:24:37 InjectorJob: Injecting urlDir: seed/urls.txt InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class. InjectorJob: total number of urls rejected by filters: 0 InjectorJob: total number of urls injected after normalization and filtering: 1 Injector: finished at 2016-08-30 10:24:41, elapsed: 00:00:03 nutch@45883500b170:~$ nutch generate -topN 40 GeneratorJob: starting at 2016-08-30 10:25:02 GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: true GeneratorJob: normalizing: true GeneratorJob: topN: 40 GeneratorJob: finished at 2016-08-30 10:25:07, time elapsed: 00:00:04 GeneratorJob: generated batch id: 1472552702-144817008 containing 1 URLs nutch@45883500b170:~$ nutch fetch -all FetcherJob: starting at 2016-08-30 10:25:16 FetcherJob: fetching all FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : -1 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 1 records. Hit by time limit :0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 fetching https://en.wikipedia.org/ (queue crawl delay=5000ms) -finishing thread FetcherThread2, activeThreads=8 -finishing thread FetcherThread6, activeThreads=6 -finishing thread FetcherThread5, activeThreads=5 -finishing thread FetcherThread7, activeThreads=7 -finishing thread FetcherThread8, activeThreads=7 -finishing thread FetcherThread4, activeThreads=4 -finishing thread FetcherThread3, activeThreads=3 -finishing thread FetcherThread1, activeThreads=2 -finishing thread FetcherThread9, activeThreads=1 -finishing thread FetcherThread0, activeThreads=0 0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread9, activeThreads=9 -finishing thread FetcherThread1, activeThreads=8 -finishing thread FetcherThread2, activeThreads=7 -finishing thread FetcherThread0, activeThreads=1 -finishing thread FetcherThread7, activeThreads=2 -finishing thread FetcherThread6, activeThreads=3 -finishing thread FetcherThread5, activeThreads=4 -finishing thread FetcherThread4, activeThreads=5 -finishing thread FetcherThread3, activeThreads=6 -finishing thread FetcherThread8, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 FetcherJob: finished at 2016-08-30 10:25:32, time elapsed: 00:00:16 nutch@45883500b170:~$
Labels:
Apache HBase,
Apache Nutch,
Elasticsearch
Subscribe to:
Posts (Atom)