You would need to use the nose2.plugins.junitxml plugin. You can configure nose2 with a config file like you would normally do, or with the --plugin command-line option.
I would second using nose. Basic XML reporting is now built in. Just use the –with-xunit command line option and it will produce a nosetests.xml file. For example:
nosetests –with-xunit
Then add a “Publish JUnit test result report” post build action, and fill in the “Test report XMLs” field with nosetests.xml (assuming that you ran nosetests in $WORKSPACE).
I’m currently playing with Hudson, but it is very Java-centric (although with this guide, I found it easier to setup than BuildBot, and produced more info)
Basically: is there any Continuous Integration systems aimed at python, that produce lots of shiny graphs and the likes?
Update: Since this time the Jenkins project has replaced Hudson as the community version of the package. The original authors have moved to this project as well. Jenkins is now a standard package on Ubuntu/Debian, RedHat/Fedora/CentOS, and others. The following update is still essentially correct. The starting point to do this with Jenkins is different.
Update: After trying a few alternatives, I think I’ll stick with Hudson. Integrity was nice and simple, but quite limited. I think Buildbot is better suited to having numerous build-slaves, rather than everything running on a single machine like I was using it.
Setting Hudson up for a Python project was pretty simple:
Violations to parse the PyLint output (you can setup warning thresholds, graph the number of violations over each build)
Cobertura can parse the coverage.py output. Nosetest can gather coverage while running your tests, using nosetests --with-coverage (this writes the output to **/coverage.xml)
Don’t know if it would do : Bitten is made by the guys who write Trac and is integrated with Trac. Apache Gump is the CI tool used by Apache. It is written in Python.
We’ve had great success with TeamCity as our CI server and using nose as our test runner. Teamcity plugin for nosetests gives you count pass/fail, readable display for failed test( that can be E-Mailed). You can even see details of the test failures while you stack is running.
If of course supports things like running on multiple machines, and it’s much simpler to setup and maintain than buildbot.
#!/var/lib/hudson/venv/main/bin/pythonimport os
import re
import subprocess
import logging
import optparse
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')#venvDir = "/var/lib/hudson/venv/main/bin/"
UPLOAD_REPO ="http://ldndev01:3442"def call_command(command, cwd, ignore_error_code=False):try:
logging.info("Running: %s"% command)
status = subprocess.call(command, cwd=cwd, shell=True)ifnot ignore_error_code and status !=0:raiseException("Last command failed")return status
except:
logging.exception("Could not run command %s"% command)raisedef main():
usage ="usage: %prog [options]"
parser = optparse.OptionParser(usage)
parser.add_option("-w","--workspace", dest="workspace",
help="workspace folder for the job")
parser.add_option("-p","--package", dest="package",
help="the package name i.e., back_office.reconciler")
parser.add_option("-v","--build_number", dest="build_number",
help="the build number, which will get put at the end of the package version")
options, args = parser.parse_args()ifnot options.workspace ornot options.package:raiseException("Need both args, do --help for info")
venvDir = options.package +"_venv/"#find out if venv is thereifnot os.path.exists(venvDir):#make it
call_command("virtualenv %s --no-site-packages"% venvDir,
options.workspace)#install the venv/make sure its there plus install the local package
call_command("%sbin/pip install -e ./ --extra-index %s"%(venvDir, UPLOAD_REPO),
options.workspace)#make sure pylint, nose and coverage are installed
call_command("%sbin/pip install nose pylint coverage epydoc"% venvDir,
options.workspace)#make sure we have an __init__.py#this shouldn't be needed if the packages are set up correctly#modules = options.package.split(".")#if len(modules) > 1: # call_command("touch '%s/__init__.py'" % modules[0], # options.workspace)#do the nosetests
test_status = call_command("%sbin/nosetests %s --with-xunit --with-coverage --cover-package %s --cover-erase"%(venvDir,
options.package.replace(".","/"),
options.package),
options.workspace,True)#produce coverage report -i for ignore weird missing file errors
call_command("%sbin/coverage xml -i"% venvDir,
options.workspace)#move it so that the code coverage plugin can find it
call_command("mv coverage.xml %s"%(options.package.replace(".","/")),
options.workspace)#run pylint
call_command("%sbin/pylint --rcfile ~/pylint.rc -f parseable %s > pylint.txt"%(venvDir,
options.package),
options.workspace,True)#remove old dists so we only have the newest at the end
call_command("rm -rfv %s"%(options.workspace +"/dist"),
options.workspace)#if the build passes upload the result to the egg_basketif test_status ==0:
logging.info("Success - uploading egg")
upload_bit ="upload -r %s/upload"% UPLOAD_REPO
else:
logging.info("Failure - not uploading egg")
upload_bit =""#create egg
call_command("%sbin/python setup.py egg_info --tag-build=.0.%s --tag-svn-revision --tag-date sdist %s"%(venvDir,
options.build_number,
upload_bit),
options.workspace)
call_command("%sbin/epydoc --html --graph all %s"%(venvDir, options.package),
options.workspace)
logging.info("Complete")if __name__ =="__main__":
main()
I guess this thread is quite old but here is my take on it with hudson:
I decided to go with pip and set up a repo (the painful to get working but nice looking eggbasket), which hudson auto uploads to with a successful tests. Here is my rough and ready script for use with a hudson config execute script like: /var/lib/hudson/venv/main/bin/hudson_script.py -w $WORKSPACE -p my.package -v $BUILD_NUMBER, just put in **/coverage.xml, pylint.txt and nosetests.xml in the config bits:
#!/var/lib/hudson/venv/main/bin/python
import os
import re
import subprocess
import logging
import optparse
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')
#venvDir = "/var/lib/hudson/venv/main/bin/"
UPLOAD_REPO = "http://ldndev01:3442"
def call_command(command, cwd, ignore_error_code=False):
try:
logging.info("Running: %s" % command)
status = subprocess.call(command, cwd=cwd, shell=True)
if not ignore_error_code and status != 0:
raise Exception("Last command failed")
return status
except:
logging.exception("Could not run command %s" % command)
raise
def main():
usage = "usage: %prog [options]"
parser = optparse.OptionParser(usage)
parser.add_option("-w", "--workspace", dest="workspace",
help="workspace folder for the job")
parser.add_option("-p", "--package", dest="package",
help="the package name i.e., back_office.reconciler")
parser.add_option("-v", "--build_number", dest="build_number",
help="the build number, which will get put at the end of the package version")
options, args = parser.parse_args()
if not options.workspace or not options.package:
raise Exception("Need both args, do --help for info")
venvDir = options.package + "_venv/"
#find out if venv is there
if not os.path.exists(venvDir):
#make it
call_command("virtualenv %s --no-site-packages" % venvDir,
options.workspace)
#install the venv/make sure its there plus install the local package
call_command("%sbin/pip install -e ./ --extra-index %s" % (venvDir, UPLOAD_REPO),
options.workspace)
#make sure pylint, nose and coverage are installed
call_command("%sbin/pip install nose pylint coverage epydoc" % venvDir,
options.workspace)
#make sure we have an __init__.py
#this shouldn't be needed if the packages are set up correctly
#modules = options.package.split(".")
#if len(modules) > 1:
# call_command("touch '%s/__init__.py'" % modules[0],
# options.workspace)
#do the nosetests
test_status = call_command("%sbin/nosetests %s --with-xunit --with-coverage --cover-package %s --cover-erase" % (venvDir,
options.package.replace(".", "/"),
options.package),
options.workspace, True)
#produce coverage report -i for ignore weird missing file errors
call_command("%sbin/coverage xml -i" % venvDir,
options.workspace)
#move it so that the code coverage plugin can find it
call_command("mv coverage.xml %s" % (options.package.replace(".", "/")),
options.workspace)
#run pylint
call_command("%sbin/pylint --rcfile ~/pylint.rc -f parseable %s > pylint.txt" % (venvDir,
options.package),
options.workspace, True)
#remove old dists so we only have the newest at the end
call_command("rm -rfv %s" % (options.workspace + "/dist"),
options.workspace)
#if the build passes upload the result to the egg_basket
if test_status == 0:
logging.info("Success - uploading egg")
upload_bit = "upload -r %s/upload" % UPLOAD_REPO
else:
logging.info("Failure - not uploading egg")
upload_bit = ""
#create egg
call_command("%sbin/python setup.py egg_info --tag-build=.0.%s --tag-svn-revision --tag-date sdist %s" % (venvDir,
options.build_number,
upload_bit),
options.workspace)
call_command("%sbin/epydoc --html --graph all %s" % (venvDir, options.package),
options.workspace)
logging.info("Complete")
if __name__ == "__main__":
main()
When it comes to deploying stuff you can do something like:
This stuff assumes you have a repo structure per package with a setup.py and dependencies all set up then you can just check out the trunk and run this stuff on it.
I hope this helps someone out.
——update———
I’ve added epydoc which fits in really nicely with hudson. Just add javadoc to your config with the html folder
Note that pip doesn’t support the -E flag properly these days, so you have to create your venv separately
If you’re considering hosted CI solution, and doing open source, you should look into Travis CI as well – it has very nice integration with GitHub. While it started as a Ruby tool, they have added Python support a while ago.
continuum’s binstar now is able to trigger builds from github and can compile for linux, osx and windows ( 32 / 64 ). the neat thing is that it really allows you to closely couple distribution and continuous integration. That’s crossing the t’s and dotting the I’s of Integration. The site, workflow and tools are really polished and AFAIK conda is the most robust and pythonic way to distributing complex python modules, where you need to wrap and distribute C/C++/Fotran libraries.
We have used bitten quite a bit. It is pretty and integrates well with Trac, but it is a pain in the butt to customize if you have any nonstandard workflow. Also there just aren’t as many plugins as there are for the more popular tools. Currently we are evaluating Hudson as a replacement.
Check rultor.com. As this article explains, it uses Docker for every build. Thanks to that, you can configure whatever you like inside your Docker image, including Python.
Little disclaimer, I’ve actually had to build a solution like this for a client that wanted a way to automatically test and deploy any code on a git push plus manage the issue tickets via git notes. This also lead to my work on the AIMS project.
One could easily just setup a bare node system that has a build user and manage their build through make(1), expect(1), crontab(1)/systemd.unit(5), and incrontab(1). One could even go a step further and use ansible and celery for distributed builds with a gridfs/nfs file store.
Although, I would not expect anyone other than a Graybeard UNIX guy or Principle level engineer/architect to actually go this far. Just makes for a nice idea and potential learning experience since a build server is nothing more than a way to arbitrarily execute scripted tasks in an automated fashion.
>LookupError:>*********************************************************************>Resource'tokenizers/punkt/english.pickle'not found.Please use the NLTK Downloader to obtain the resource: nltk.download().Searchedin:>-'C:\\Users\\Martinos/nltk_data'>-'C:\\nltk_data'>-'D:\\nltk_data'>-'E:\\nltk_data'>-'E:\\Python26\\nltk_data'>-'E:\\Python26\\lib\\nltk_data'>-'C:\\Users\\Martinos\\AppData\\Roaming\\nltk_data'>**********************************************************************
I had this same problem. Go into a python shell and type:
>>> import nltk
>>> nltk.download()
Then an installation window appears. Go to the ‘Models’ tab and select ‘punkt’ from under the ‘Identifier’ column. Then click Download and it will install the necessary files. Then it should work!
import nltk
nltk.download('punkt')
from nltk import word_tokenize,sent_tokenize
You can download the tokenizers by passing punkt as an argument to the download function. The word and sentence tokenizers are then available on nltk.
If you want to download everything i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, do not pass any arguments like this.
# Do this in a separate python interpreter session, since you only have to do it onceimport nltk
nltk.download('punkt')# Do this in your ipython notebook or analysis scriptfrom nltk.tokenize import word_tokenize
sentences =["Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.","Professor Plum has a green plant in his study.","Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."]
sentences_tokenized =[]for s in sentences:
sentences_tokenized.append(word_tokenize(s))
# Do this in a separate python interpreter session, since you only have to do it once
import nltk
nltk.download('punkt')
# Do this in your ipython notebook or analysis script
from nltk.tokenize import word_tokenize
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
sentences_tokenized = []
for s in sentences:
sentences_tokenized.append(word_tokenize(s))
sentences_tokenized is a list of a list of tokens:
from nltk.data import load
from nltk.tokenize.treebank importTreebankWordTokenizer
sentences =["Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.","Professor Plum has a green plant in his study.","Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."]
tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize =TreebankWordTokenizer().tokenize
wordToken =[]for sent in sentences:
subSentToken =[]for subSent in tokenizer.tokenize(sent):
subSentToken.extend([token for token in treebank_word_tokenize(subSent)])
wordToken.append(subSentToken)for token in wordToken:print token
nltk have its pre-trained tokenizer models. Model is downloading from internally predefined web sources and stored at path of installed nltk package while executing following possible function calls.
E.g. 1
tokenizer = nltk.data.load(‘nltk:tokenizers/punkt/english.pickle’)
E.g. 2
nltk.download(‘punkt’)
If you call above sentence in your code, Make sure you have internet connection without any firewall protections.
I would like to share some more better alter-net way to resolve above issue with more better deep understandings.
Please follow following steps and enjoy english word tokenization using nltk.
Step 1: First download the “english.pickle” model following web path.
Step 2: Extract the downloaded “punkt.zip” file and find the “english.pickle” file from it and place in C drive.
Step 3: copy paste following code and execute.
from nltk.data import load
from nltk.tokenize.treebank import TreebankWordTokenizer
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize = TreebankWordTokenizer().tokenize
wordToken = []
for sent in sentences:
subSentToken = []
for subSent in tokenizer.tokenize(sent):
subSentToken.extend([token for token in treebank_word_tokenize(subSent)])
wordToken.append(subSentToken)
for token in wordToken:
print token
i came across this problem when i was trying to do pos tagging in nltk.
the way i got it correct is by making a new directory along with corpora directory named “taggers” and copying max_pos_tagger in directory taggers.
hope it works for you too. best of luck with it!!!.
In Spyder, go to your active shell and download nltk using below 2 commands.
import nltk
nltk.download()
Then you should see NLTK downloader window open as below, Go to ‘Models’ tab in this window and click on ‘punkt’ and download ‘punkt’
The punkt tokenizers data is quite large at over 35 MB, this can be a big deal if like me you are running nltk in an environment such as lambda that has limited resources.
If you only need one or perhaps a few language tokenizers you can drastically reduce the size of the data by only including those languages .pickle files.
If all you only need to support English then your nltk data size can be reduced to 407 KB (for the python 3 version).
Somewhere in your environment create the folders: nltk_data/tokenizers/punkt, if using python 3 add another folder PY3 so that your new directory structure looks like nltk_data/tokenizers/punkt/PY3. In my case I created these folders at the root of my project.
Extract the zip and move the .pickle files for the languages you want to support into the punkt folder you just created. Note: Python 3 users should use the pickles from the PY3 folder. With your language files loaded it should look something like: example-folder-stucture
Now you just need to add your nltk_data folder to the search paths, assuming your data is not in one of the pre-defined search paths. You can add your data using either the environment variable NLTK_DATA='path/to/your/nltk_data'. You can also add a custom path at runtime in python by doing:
from nltk import data
data.path += ['/path/to/your/nltk_data']
NOTE: If you don’t need to load in the data at runtime or bundle the data with your code, it would be best to create your nltk_data folders at the built-in locations that nltk looks for.
In Python-3.6 I can see the suggestion in the traceback. That’s quite helpful.
Hence I will say you guys to pay attention to the error you got, most of the time answers are within that problem ;).
And then as suggested by other folks here either using python terminal or using a command like python -c "import nltk; nltk.download('wordnet')" we can install them on the fly.
You just need to run that command once and then it will save the data locally in your home directory.
回答 14
使用分配的文件夹进行多次下载时,我遇到了类似的问题,我不得不手动添加数据路径:
一次下载,可以按以下步骤完成(工作)
import os as _os
from nltk.corpus import stopwords
from nltk import download as nltk_download
nltk_download('stopwords', download_dir=_os.path.join(get_project_root_path(),'temp'), raise_on_error=True)
stop_words: list = stopwords.words('english')
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
nltk_download(['stopwords','punkt'], download_dir=_os.path.join(get_project_root_path(),'temp'), raise_on_error=True)print(stopwords.words('english'))print(word_tokenize("I am trying to find the download path 99."))
错误:
找不到资源点。请使用NLTK下载器获取资源:
导入nltk nltk.download(’punkt’)
现在,如果我将ntlk数据路径附加到我的下载路径中,则它可以正常工作:
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
from nltk.data import path as nltk_path
nltk_path.append( _os.path.join(get_project_root_path(),'temp'))
nltk_download(['stopwords','punkt'], download_dir=_os.path.join(get_project_root_path(),'temp'), raise_on_error=True)print(stopwords.words('english'))print(word_tokenize("I am trying to find the download path 99."))
I had similar issue when using an assigned folder for multiple downloads, and I had to append the data path manually:
single download, can be achived as followed (works)
import os as _os
from nltk.corpus import stopwords
from nltk import download as nltk_download
nltk_download('stopwords', download_dir=_os.path.join(get_project_root_path(), 'temp'), raise_on_error=True)
stop_words: list = stopwords.words('english')
This code works, meaning that nltk remembers the download path passed in the download fuction. On the other nads if I download a subsequent package I get similar error as described by user:
Multiple downloads raise an error:
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
nltk_download(['stopwords', 'punkt'], download_dir=_os.path.join(get_project_root_path(), 'temp'), raise_on_error=True)
print(stopwords.words('english'))
print(word_tokenize("I am trying to find the download path 99."))
Error:
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download(‘punkt’)
Now if I append the ntlk data path with my download path, it works:
import os as _os
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import download as nltk_download
from nltk.data import path as nltk_path
nltk_path.append( _os.path.join(get_project_root_path(), 'temp'))
nltk_download(['stopwords', 'punkt'], download_dir=_os.path.join(get_project_root_path(), 'temp'), raise_on_error=True)
print(stopwords.words('english'))
print(word_tokenize("I am trying to find the download path 99."))
This works… Not sure why works in one case but not the other, but error message seems to imply that it doesn’t check into the download folder the second time.
NB: using windows8.1/python3.7/nltk3.5