Python 实用宝典

Question 1

I would like to know which programming language is better for natural language processing. Java or Python? I have found lots of questions and answers regarding about it. But I am still lost in choosing which one to use.

And I want to know which NLP library to use for Java since there are lots of libraries (LingPipe, GATE, OpenNLP, StandfordNLP). For Python, most programmers recommend NLTK.

But if I am to do some text processing or information extraction from unstructured data (just free formed plain English text) to get some useful information, what is the best option? Java or Python? Suitable library?

Updated

What I want to do is to extract useful product information from unstructured data (E.g. users make different forms of advertisement about mobiles or laptops with not very standard English language)

Question 2

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you’ll need to use one or the other and often there isn’t much of a choice unless you’re heading a project.

Other than NLTK (www.nltk.org), there are actually other libraries for text processing in python:

TextBlob: http://textblob.readthedocs.org/en/dev/
Gensim: http://radimrehurek.com/gensim/
Pattern: http://www.clips.ua.ac.be/pattern
Spacy:: http://spacy.io
Orange: http://orange.biolab.si/features/
Pineapple: https://github.com/proycon/pynlpl

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

For Java, there’re tonnes of others but here’s another list:

Freeling: http://nlp.lsi.upc.edu/freeling/
OpenNLP: http://opennlp.apache.org/
LingPipe: http://alias-i.com/lingpipe/
Stanford CoreNLP: http://stanfordnlp.github.io/CoreNLP/ (comes with wrappers for other languages, python included)
CogComp NLP: https://github.com/CogComp/cogcomp-nlp

This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

If you’re uncertain, which is the language to go for NLP, personally i say, “any language that will give you the desired analysis/output”, see Which language or tools to learn for natural language processing?

Here’s a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp

An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp

Other than language processing tools, you would very much need machine learning tools to incorporate into NLP pipelines.

There’s a whole range in Python and Java, and once again it’s up to preference and whether the libraries are user-friendly enough:

Machine Learning libraries in python:

Sklearn (Scikit-learn): http://scikit-learn.org/stable/
Milk: http://luispedro.org/software/milk
Scipy: http://www.scipy.org/
Theano: http://deeplearning.net/software/theano/
PyML: http://pyml.sourceforge.net/
pyBrain: http://pybrain.org/
Graphlab Create (Commerical tool but free academic license for 1 year): https://dato.com/products/create/

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)

Weka: http://www.cs.waikato.ac.nz/ml/weka/index.html
Mallet: http://mallet.cs.umass.edu/
Mahout: https://mahout.apache.org/

With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

I’ll avoid listing deep learning tools out of non-favoritism / neutrality.

Other Stackoverflow questions that also asked for NLP/ML tools:

Question 3

The question is very open ended. That said, rather than choose one, below is a comparison depending on the language that you would like to use (since there are good libraries available in both languages).

Python

In terms of Python, the first place you should look at is the Python Natural Language Toolkit. As they note in their description, NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

There is also some excellent code that you can look up that originated out of Google’s Natural Language Toolkit project that is Python based. You can find a link to that code here on GitHub.

Java

The first place to look would be Stanford’s Natural Language Processing Group. All of software that is distributed there is written in Java. All recent distributions require Oracle Java 6+ or OpenJDK 7+. Distribution packages include components for command-line invocation, jar files, a Java API, and source code.

Another great option that you see in a lot of machine learning environments here (general option), is Weka. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

Question 4

I tend to use SQLite when doing Django development, but on a live server something more robust is often needed (MySQL/PostgreSQL, for example). Invariably, there are other changes to make to the Django settings as well: different logging locations / intensities, media paths, etc.

How do you manage all these changes to make deployment a simple, automated process?

Question 5

Update: django-configurations has been released which is probably a better option for most people than doing it manually.

If you would prefer to do things manually, my earlier answer still applies:

I have multiple settings files.

settings_local.py – host-specific configuration, such as database name, file paths, etc.
settings_development.py – configuration used for development, e.g. DEBUG = True.
settings_production.py – configuration used for production, e.g. SERVER_EMAIL.

I tie these all together with a settings.py file that firstly imports settings_local.py, and then one of the other two. It decides which to load by two settings inside settings_local.py – DEVELOPMENT_HOSTS and PRODUCTION_HOSTS. settings.py calls platform.node() to find the hostname of the machine it is running on, and then looks for that hostname in the lists, and loads the second settings file depending on which list it finds the hostname in.

That way, the only thing you really need to worry about is keeping the settings_local.py file up to date with the host-specific configuration, and everything else is handled automatically.

Check out an example here.

Question 6

Personally, I use a single settings.py for the project, I just have it look up the hostname it’s on (my development machines have hostnames that start with “gabriel” so I just have this:

import socket
if socket.gethostname().startswith('gabriel'):
    LIVEHOST = False
else: 
    LIVEHOST = True

then in other parts I have things like:

if LIVEHOST:
    DEBUG = False
    PREPEND_WWW = True
    MEDIA_URL = 'http://static1.grsites.com/'
else:
    DEBUG = True
    PREPEND_WWW = False
    MEDIA_URL = 'http://localhost:8000/static/'

and so on. A little bit less readable, but it works fine and saves having to juggle multiple settings files.

Question 7

At the end of settings.py I have the following:

try:
    from settings_local import *
except ImportError:
    pass

This way if I want to override default settings I need to just put settings_local.py right next to settings.py.

Question 8

I have two files. settings_base.py which contains common/default settings, and which is checked into source control. Each deployment has a separate settings.py, which executes from settings_base import * at the beginning and then overrides as needed.

Question 9

The most simplistic way I found was:

1) use the default settings.py for local development and 2) create a production-settings.py starting with:

import os
from settings import *

And then just override the settings that differ in production:

DEBUG = False
TEMPLATE_DEBUG = DEBUG


DATABASES = {
    'default': {
           ....
    }
}

Question 10

Somewhat related, for the issue of deploying Django itself with multiple databases, you may want to take a look at Djangostack. You can download a completely free installer that allows you to install Apache, Python, Django, etc. As part of the installation process we allow you to select which database you want to use (MySQL, SQLite, PostgreSQL). We use the installers extensively when automating deployments internally (they can be run in unattended mode).

Question 11

I have my settings.py file in an external directory. That way, it doesn’t get checked into source control, or over-written by a deploy. I put this in the settings.py file under my Django project, along with any default settings:

import sys
import os.path

def _load_settings(path):    
    print "Loading configuration from %s" % (path)
    if os.path.exists(path):
    settings = {}
    # execfile can't modify globals directly, so we will load them manually
    execfile(path, globals(), settings)
    for setting in settings:
        globals()[setting] = settings[setting]

_load_settings("/usr/local/conf/local_settings.py")

Note: This is very dangerous if you can’t trust local_settings.py.

Question 12

In addition to the multiple settings files mentioned by Jim, I also tend to place two settings into my settings.py file at the top BASE_DIR and BASE_URL set to the path of the code and the URL to the base of the site, all other settings are modified to append themselves to these.

BASE_DIR = "/home/sean/myapp/" e.g. MEDIA_ROOT = "%smedia/" % BASEDIR

So when moving the project I only have to edit these settings and not search the whole file.

I would also recommend looking at fabric and Capistrano (Ruby tool, but it can be used to deploy Django applications) which facilitate automation of remote deployment.

Question 13

Well, I use this configuration:

At the end of settings.py:

#settings.py
try:
    from locale_settings import *
except ImportError:
    pass

And in locale_settings.py:

#locale_settings.py
class Settings(object):

    def __init__(self):
        import settings
        self.settings = settings

    def __getattr__(self, name):
        return getattr(self.settings, name)

settings = Settings()

INSTALLED_APPS = settings.INSTALLED_APPS + (
    'gunicorn',)

# Delete duplicate settings maybe not needed, but I prefer to do it.
del settings
del Settings

Question 14

So many complicated answers!

Every settings.py file comes with :

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

I use that directory to set the DEBUG variable like this (reaplace with the directoy where your dev code is):

DEBUG=False
if(BASE_DIR=="/path/to/my/dev/dir"):
    DEBUG = True

Then, every time the settings.py file is moved, DEBUG will be False and it’s your production environment.

Every time you need different settings than the ones in your dev environment just use:

if(DEBUG):
    #Debug setting
else:
    #Release setting

Question 15

I think it depends on the size of the site as to whether you need to step up from using SQLite, I’ve successfully used SQLite on several smaller live sites and it runs great.

Question 16

I use environment:

if os.environ.get('WEB_MODE', None) == 'production' :
   from settings_production import *
else :
   from settings_dev import *

I believe this is a much better approach, because eventually you need special settings for your test environment, and you can easily add it to this condition.

Question 17

This is an older post but I think if I add this useful library it will simplify things.

Use django-configuration

Quickstart

pip install django-configurations

Then subclass the included configurations.Configuration class in your project’s settings.py or any other module you’re using to store the settings constants, e.g.:

# mysite/settings.py

from configurations import Configuration

class Dev(Configuration):
    DEBUG = True

Set the DJANGO_CONFIGURATION environment variable to the name of the class you just created, e.g. in ~/.bashrc:

export DJANGO_CONFIGURATION=Dev

and the DJANGO_SETTINGS_MODULE environment variable to the module import path as usual, e.g. in bash:

export DJANGO_SETTINGS_MODULE=mysite.settings

Alternatively supply the --configuration option when using Django management commands along the lines of Django’s default --settings command line option, e.g.:

python manage.py runserver --settings=mysite.settings --configuration=Dev

To enable Django to use your configuration you now have to modify your manage.py or wsgi.py script to use django-configurations’ versions of the appropriate starter functions, e.g. a typical manage.py using django-configurations would look like this:

#!/usr/bin/env python

import os
import sys

if __name__ == "__main__":
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mysite.settings')
    os.environ.setdefault('DJANGO_CONFIGURATION', 'Dev')

    from configurations.management import execute_from_command_line

    execute_from_command_line(sys.argv)

Notice in line 10 we don’t use the common tool django.core.management.execute_from_command_line but instead configurations.management.execute_from_command_line.

The same applies to your wsgi.py file, e.g.:

import os

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mysite.settings')
os.environ.setdefault('DJANGO_CONFIGURATION', 'Dev')

from configurations.wsgi import get_wsgi_application

application = get_wsgi_application()

Here we don’t use the default django.core.wsgi.get_wsgi_application function but instead configurations.wsgi.get_wsgi_application.

That’s it! You can now use your project with manage.py and your favorite WSGI enabled server.

Question 18

In fact you should probably consider having the same (or almost the same) configs for your development and production environment. Otherwise, situations like “Hey, it works on my machine” will happen from time to time.

So in order to automate your deployment and eliminate those WOMM issues, just use Docker.

Question 19

What are the relative merits / downsides of various Python bundles (EPD / Anaconda) vs. a manual install?

I have installed EPD academic, and I have no issues with it. It provides more packages that I think I will ever need, and it is very easy to update using enpkg enstaller. The EPD academic licence requires yearly renewal however and the free version does not do updates as easily.

At the moment I really only use a handful of packages such as Pandas, NumPy, SciPy, matplotlib, IPython, Statsmodels and their respective dependencies.

For such limited use am I better off with manual install and pip install --upgrade 'package' or do the bundles offer anything over and above this?

Question 20

Update 2015: Nowadays I always recommend Anaconda. It includes lots of Python packages for scientific computing, data science, web development, etc. It also provides a superior environment tool, conda, which allows to easily switch between environments, even between Python 2 and 3. It is also updated very quickly as soon as a new version of a package is released, and you can just do conda update packagename to update it.

Original answer below:

On Windows, what is complicated is to compile the math packages, so I think a manual install is a viable option only if you are interested only in Python, without other packages.

Therefore better chose either EPD (now Canopy) or Anaconda.

Anaconda has around 270 packages, including the most important for most scientific applications and data analysis, that is, NumPy, SciPy, Pandas, IPython, matplotlib, Scikit-learn. So if this is enough for you, I would choose Anaconda.

Instead, if you are interested in other packages, and even more if you use any of the Enthought packages (Chaco for example is very useful for realtime data visualization), then EPD/Canopy is probably a better choice. The Academic version has a larger number of packages in the base install, and many more in the repository. Anaconda also includes Chaco.

Question 21

I have tried various Windows distributions in the last year, trying to find one sutable for my work environment (behind a proxy, but without access to proxy configuration).

Here is my feedback from experience:

EPD/Canopy: We had a license of EPD, but it was old and we were unable to update becasue of the weird proxy situation. In order to add some packages (such as recent version of xlrd/xlwt), I compiled from source. To update SciPy and NumPy, I used the precompiled installer from http://www.lfd.uci.edu/~gohlke/pythonlibs/, but it would sometimes screw up compatibility. I loved having a fully configured Py2exe and Cython, and it simply worked out of the box.

After a while, I tried installing the free version of Canopy, but it lacks Cython and py2exe and some specific advanced packaged I needed, so I never really used it. Some of my colleagues bought the full Canopy license, but we’re still not sure how they’re going to update…

Python(x,y): Not wanting to struggle with licenses, I installed Python(x,y) at home. The only downside I noticed right now is that the standard installation requires you to select which packages you want. It’s both a good and a bad point, because I can’t be sure that my clients will have the exact same configuration as I do when I install. (The Enthought tool suite can be installed in Python(x,y).) After using Python(x,y) for a while, I just noticed I installed the 32 bit version. Although it is not clear on their website, it seems they don’t have a 64 bit version as of July 2015. I’m going to uninstall it and get a 64 bit distribution.

Anaconda: When I first wrote this, Anaconda didn’t seem to have enough packages yet. A couple of years later, it seems much better, I’m going to give it a try!

Manual: In order to avoid version compatibility issues with our old EPD version, I ended up using manual Python installation and adding additional packages from the LFD website linked above. It works great, but I would still suggest Canopy to a new user who requires advanced packages (like GDAL or PyFITS).

Summary: If you go for Canopy, get the full licence (Academic or purchased). Else, go with Python(x,y), it will end up being the same.

On Ubuntu: No need for a distribution. It’s all relatively recent (+/- 6 months is tolerable) and pre-compiled. You just need to execute sudo apt-get install python python-scipy and it’s there! Most advanced packages are there as well.

Question 22

The other answers cover the ground quite nicely, so I just want to remark on one particular aspect that nobody has mentioned yet. It is probably fairly niche, but it may potentially make or break Anaconda or Canopy for some people under Linux systems:

Anaconda Python builds use the UCS4 Unicode mode, whereas Enthought Canopy uses UCS2.

What this means in practical terms is that if you rely on any extensions which you cannot compile yourself for whatever reason (e.g. pre-compiled proprietary libraries), if they happen not to be built for a Python version with the same mode, you may sooner or later run into errors that look something like undefined symbol: PyUnicodeUCS4_AsUTF8String.

According to PEP 0513, UCS4 seems to currently be more popular and recommended. Also, the whole UCS compatibility issues seem to only affect 2.x and < 3.3 versions.

Question 23

I used Anaconda for years and liked it quite a bit. Unfortunately, IPython Notebook (now Jupyter) is unavailable without the enterprise edition.

I want to use Jupyter notebooks in the classroom, so I switched to Canopy. It seems easy enough to install all of the packages we need. Admittedly, we haven’t tested them all.

Question 24

I’m using Python 3.4 on Windows. When I run a script, it complains

ImportError: No Module named 'PyQt4'

So I tried to install it, but pip install PyQt4 gives

Could not find any downloads that satisfy the requirement PyQt4

although it does show up when I run pip search PyQt4. I tried to pip install python-qt, which installed successfully but that didn’t solve the problem.

What am I doing wrong?

Question 25

Here are Windows wheel packages built by Chris Golke – Python Windows Binary packages – PyQt

In the filenames cp27 means C-python version 2.7, cp35 means python 3.5, etc.

Since Qt is a more complicated system with a compiled C++ codebase underlying the python interface it provides you, it can be more complex to build than just a pure python code package, which means it can be hard to install it from source.

Make sure you grab the correct Windows wheel file (python version, 32/64 bit), and then use pip to install it – e.g:

C:\path\where\wheel\is\> pip install PyQt4-4.11.4-cp35-none-win_amd64.whl

Should properly install if you are running an x64 build of Python 3.5.

Question 26

QT no longer supports PyQt4, but you can install PyQt5 with pip:

pip install PyQt5

Question 27

You can’t use pip. You have to download from the Riverbank website and run the installer for your version of python. If there is no install for your version, you will have to install Python for one of the available installers, or build from source (which is rather involved). Other answers and comments have the links.

Question 28

If you install PyQt4 on Windows, files wind up here by default:

C:\Python27\Lib\site-packages\PyQt4*.*

but it also leaves a file here:

C:\Python27\Lib\site-packages\sip.pyd

If you copy the both the sip.pyd and PyQt4 folder into your virtualenv things will work fine.

For example:

mkdir c:\code
cd c:\code
virtualenv BACKUP
cd c:\code\BACKUP\scripts
activate

Then with windows explorer copy from C:\Python27\Lib\site-packages the file (sip.pyd) and folder (PyQt4) mentioned above to C:\code\BACKUP\Lib\site-packages\

Then back at CLI:

cd ..                 
(c:\code\BACKUP)
python backup.py

The problem with trying to launch a script which calls PyQt4 from within virtualenv is that the virtualenv does not have PyQt4 installed and it doesn’t know how to reference the default installation described above. But follow these steps to copy PyQt4 into your virtualenv and things should work great.

Question 29

Earlier PyQt .exe installers were available directly from the website download page. Now with the release of PyQt4.12 , installers have been deprecated. You can make the libraries work somehow by compiling them but that would mean going to great lengths of trouble.

Otherwise you can use the previous distributions to solve your purpose. The .exe windows installers can be downloaded from :

https://sourceforge.net/projects/pyqt/files/PyQt4/PyQt-4.11.4/

Question 30

It looks like you may have to do a bit of manual installation for PyQt4.

http://pyqt.sourceforge.net/Docs/PyQt4/installation.html

This might help a bit more, it’s a bit more in a tutorial/set-by-step format:

http://movingthelamppost.com/blog/html/2013/07/12/installing_pyqt____because_it_s_too_good_for_pip_or_easy_install_.html

Question 31

With current latest python 3.6.5

pip3 install PyQt5

works fine

Question 32

Try this for PyQt5:

pip install PyQt5

Use the operating system on this link for PyQt4.

Or download the supported wheel for your platform on this link.

Else use this link for the windows executable installer. Hopefully this helps you to install either PyQt4 or PyQt5.

Question 33

For Windows:

download the appropriate version of the PyQt4 from here:

https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyqt4

and install it using pip (example for Python3.6 – 64bit)

 pip install PyQt4‑4.11.4‑cp36‑cp36m‑win_amd64.whl

Question 34

install PyQt5 for Windows 10 and python 3.5+.

pip install PyQt5

Question 35

If you have error while installing PyQt4.

Error: PyQt4-4.11.4-cp27-cp27m-win_amd64.whl is not a supported wheel on this platform.

My system type is 64 bit, But to solve this error I have installed PyQt4 of 32 bit windows system, i.e PyQt4-4.11.4-cp27-cp27m-win32.whl – click here to see more versions.

Kindly select appropriate version of PyQt4 according to your installed python version.

Question 36

You can also use this command to install PyQt5.

pip3 install PyQt5

Question 37

I am using PyCharm, and was able to install PyQt5.

PyQt4, as well as PyQt4Enhanced and windows_whl both failed to install, I’m guessing that’s because Qt4 is no longer supported.

Question 38

How to covert a DataFrame column containing strings and NaN values to floats. And there is another column whose values are strings and floats; how to convert this entire column to floats.

Question 39

NOTE: pd.convert_objects has now been deprecated. You should use pd.Series.astype(float) or pd.to_numeric as described in other answers.

This is available in 0.11. Forces conversion (or set’s to nan) This will work even when astype will fail; its also series by series so it won’t convert say a complete string column

In [10]: df = DataFrame(dict(A = Series(['1.0','1']), B = Series(['1.0','foo'])))

In [11]: df
Out[11]: 
     A    B
0  1.0  1.0
1    1  foo

In [12]: df.dtypes
Out[12]: 
A    object
B    object
dtype: object

In [13]: df.convert_objects(convert_numeric=True)
Out[13]: 
   A   B
0  1   1
1  1 NaN

In [14]: df.convert_objects(convert_numeric=True).dtypes
Out[14]: 
A    float64
B    float64
dtype: object

Question 40

You can try df.column_name = df.column_name.astype(float). As for the NaN values, you need to specify how they should be converted, but you can use the .fillna method to do it.

Example:

In [12]: df
Out[12]: 
     a    b
0  0.1  0.2
1  NaN  0.3
2  0.4  0.5

In [13]: df.a.values
Out[13]: array(['0.1', nan, '0.4'], dtype=object)

In [14]: df.a = df.a.astype(float).fillna(0.0)

In [15]: df
Out[15]: 
     a    b
0  0.1  0.2
1  0.0  0.3
2  0.4  0.5

In [16]: df.a.values
Out[16]: array([ 0.1,  0. ,  0.4])

Question 41

In a newer version of pandas (0.17 and up), you can use to_numeric function. It allows you to convert the whole dataframe or just individual columns. It also gives you an ability to select how to treat stuff that can’t be converted to numeric values:

import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')

Question 42

df['MyColumnName'] = df['MyColumnName'].astype('float64')

Question 43

you have to replace empty strings (”) with np.nan before converting to float. ie:

df['a']=df.a.replace('',np.nan).astype(float)

Question 44

Here is an example

                            GHI             Temp  Power Day_Type
2016-03-15 06:00:00 -7.99999952505459e-7    18.3    0   NaN
2016-03-15 06:01:00 -7.99999952505459e-7    18.2    0   NaN
2016-03-15 06:02:00 -7.99999952505459e-7    18.3    0   NaN
2016-03-15 06:03:00 -7.99999952505459e-7    18.3    0   NaN
2016-03-15 06:04:00 -7.99999952505459e-7    18.3    0   NaN

but if this is all string values…as was in my case… Convert the desired columns to floats:

df_inv_29['GHI'] = df_inv_29.GHI.astype(float)
df_inv_29['Temp'] = df_inv_29.Temp.astype(float)
df_inv_29['Power'] = df_inv_29.Power.astype(float)

Your dataframe will now have float values :-)

Question 45

I’m scripting the checkout, build, distribution, test, and commit cycle for a large C++ solution that is using Monotone, CMake, Visual Studio Express 2008, and custom tests.

All of the other parts seem pretty straight-forward, but I don’t see how to compile the Visual Studio solution without getting the GUI.

The script is written in Python, but an answer that would allow me to just make a call to: os.system would do.

Question 46

I know of two ways to do it.

Method 1
The first method (which I prefer) is to use msbuild:

msbuild project.sln /Flags...

Method 2
You can also run:

vcexpress project.sln /build /Flags...

The vcexpress option returns immediately and does not print any output. I suppose that might be what you want for a script.

Note that DevEnv is not distributed with Visual Studio Express 2008 (I spent a lot of time trying to figure that out when I first had a similar issue).

So, the end result might be:

os.system("msbuild project.sln /p:Configuration=Debug")

You’ll also want to make sure your environment variables are correct, as msbuild and vcexpress are not by default on the system path. Either start the Visual Studio build environment and run your script from there, or modify the paths in Python (with os.putenv).

Question 47

MSBuild usually works, but I’ve run into difficulties before. You may have better luck with

devenv YourSolution.sln /Build

Question 48

To be honest I have to add my 2 cents.

You can do it with msbuild.exe. There are many version of the msbuild.exe.

C:\Windows\Microsoft.NET\Framework64\v2.0.50727\msbuild.exe C:\Windows\Microsoft.NET\Framework64\v3.5\msbuild.exe C:\Windows\Microsoft.NET\Framework64\v4.0.30319\msbuild.exe
C:\Windows\Microsoft.NET\Framework\v2.0.50727\msbuild.exe C:\Windows\Microsoft.NET\Framework\v3.5\msbuild.exe C:\Windows\Microsoft.NET\Framework\v4.0.30319\msbuild.exe

Use version you need. Basically you have to use the last one.

C:\Windows\Microsoft.NET\Framework64\v4.0.30319\msbuild.exe

So how to do it.

Run the COMMAND window
Input the path to msbuild.exe

C:\Windows\Microsoft.NET\Framework64\v4.0.30319\msbuild.exe

Input the path to the project solution like

“C:\Users\Clark.Kent\Documents\visual studio 2012\Projects\WpfApplication1\WpfApplication1.sln”

Add any flags you need after the solution path.
Press ENTER

Note you can get help about all possible flags like

C:\Windows\Microsoft.NET\Framework64\v4.0.30319\msbuild.exe /help

Question 49

Using msbuild as pointed out by others worked for me but I needed to do a bit more than just that. First of all, msbuild needs to have access to the compiler. This can be done by running:

"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\vcvarsall.bat"

Then msbuild was not in my $PATH so I had to run it via its explicit path:

"C:\Windows\Microsoft.NET\Framework64\v4.0.30319\MSBuild.exe" myproj.sln

Lastly, my project was making use of some variables like $(VisualStudioDir). It seems those do not get set by msbuild so I had to set them manually via the /property option:

"C:\Windows\Microsoft.NET\Framework64\v4.0.30319\MSBuild.exe" /property:VisualStudioDir="C:\Users\Administrator\Documents\Visual Studio 2013" myproj.sln

That line then finally allowed me to compile my project.

Bonus: it seems that the command line tools do not require a registration after 30 days of using them like the “free” GUI-based Visual Studio Community edition does. With the Microsoft registration requirement in place, that version is hardly free. Free-as-in-facebook if anything…

Question 50

MSBuild is your friend.

msbuild "C:\path to solution\project.sln"

Question 51

DEVENV works well in many cases, but on a WIXPROJ to build my WIX installer, all I got is “CATASTROPHIC” error in the Out log.

This works: MSBUILD /Path/PROJECT.WIXPROJ /t:Build /p:Configuration=Release

Question 52

How can I put text in the top left (or top right) corner of a matplotlib figure, e.g. where a top left legend would be, or on top of the plot but in the top left corner? E.g. if it’s a plt.scatter(), then something that would be within the square of the scatter, put in the top left most corner.

I’d like to do this without ideally knowing the scale of the scatterplot being plotted for example, since it will change from dataset to data set. I just want it the text to be roughly in the upper left, or roughly in the upper right. With legend type positioning it should not overlap with any scatter plot points anyway.

thanks!

Question 53

You can use text.

text(x, y, s, fontsize=12)

text coordinates can be given relative to the axis, so the position of your text will be independent of the size of the plot:

The default transform specifies that text is in data coords, alternatively, you can specify text in axis coords (0,0 is lower-left and 1,1 is upper-right). The example below places text in the center of the axes::

text(0.5, 0.5,'matplotlib',
     horizontalalignment='center',
     verticalalignment='center',
     transform = ax.transAxes)

To prevent the text to interfere with any point of your scatter is more difficult afaik. The easier method is to set y_axis (ymax in ylim((ymin,ymax))) to a value a bit higher than the max y-coordinate of your points. In this way you will always have this free space for the text.

EDIT: here you have an example:

In [17]: from pylab import figure, text, scatter, show
In [18]: f = figure()
In [19]: ax = f.add_subplot(111)
In [20]: scatter([3,5,2,6,8],[5,3,2,1,5])
Out[20]: <matplotlib.collections.CircleCollection object at 0x0000000007439A90>
In [21]: text(0.1, 0.9,'matplotlib', ha='center', va='center', transform=ax.transAxes)
Out[21]: <matplotlib.text.Text object at 0x0000000007415B38>
In [22]:

The ha and va parameters set the alignment of your text relative to the insertion point. ie. ha=’left’ is a good set to prevent a long text to go out of the left axis when the frame is reduced (made narrower) manually.

Question 54

One solution would be to use the plt.legend function, even if you don’t want an actual legend. You can specify the placement of the legend box by using the loc keyterm. More information can be found at this website but I’ve also included an example showing how to place a legend:

ax.scatter(xa,ya, marker='o', s=20, c="lightgreen", alpha=0.9)
ax.scatter(xb,yb, marker='o', s=20, c="dodgerblue", alpha=0.9)
ax.scatter(xc,yc marker='o', s=20, c="firebrick", alpha=1.0)
ax.scatter(xd,xd,xd, marker='o', s=20, c="goldenrod", alpha=0.9)
line1 = Line2D(range(10), range(10), marker='o', color="goldenrod")
line2 = Line2D(range(10), range(10), marker='o',color="firebrick")
line3 = Line2D(range(10), range(10), marker='o',color="lightgreen")
line4 = Line2D(range(10), range(10), marker='o',color="dodgerblue")
plt.legend((line1,line2,line3, line4),('line1','line2', 'line3', 'line4'),numpoints=1, loc=2)

Note that because loc=2, the legend is in the upper-left corner of the plot. And if the text overlaps with the plot, you can make it smaller by using legend.fontsize, which will then make the legend smaller.

Question 55

What’s a good way to check if a package is installed while within a Python script? I know it’s easy from the interpreter, but I need to do it within a script.

I guess I could check if there’s a directory on the system that’s created during the installation, but I feel like there’s a better way. I’m trying to make sure the Skype4Py package is installed, and if not I’ll install it.

My ideas for accomplishing the check

check for a directory in the typical install path
try to import the package and if an exception is throw, then install package

Question 56

If you mean a python script, just do something like this:

Python 3.3+ use sys.modules and find_spec:

import importlib.util
import sys

# For illustrative purposes.
name = 'itertools'

if name in sys.modules:
    print(f"{name!r} already in sys.modules")
elif (spec := importlib.util.find_spec(name)) is not None:
    # If you choose to perform the actual import ...
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    spec.loader.exec_module(module)
    print(f"{name!r} has been imported")
else:
    print(f"can't find the {name!r} module")

Python 3:

try:
    import mymodule
except ImportError as e:
    pass  # module doesn't exist, deal with it.

Python 2:

try:
    import mymodule
except ImportError, e:
    pass  # module doesn't exist, deal with it.

Question 57

Updated answer

A better way of doing this is:

import subprocess
import sys

reqs = subprocess.check_output([sys.executable, '-m', 'pip', 'freeze'])
installed_packages = [r.decode().split('==')[0] for r in reqs.split()]

The result:

print(installed_packages)

[
    "Django",
    "six",
    "requests",
]

Check if requests is installed:

if 'requests' in installed_packages:
    # Do something

Why this way? Sometimes you have app name collisions. Importing from the app namespace doesn’t give you the full picture of what’s installed on the system.

Note, that proposed solution works:

When using pip to install from PyPI or from any other alternative source (like pip install http://some.site/package-name.zip or any other archive type).
When installing manually using python setup.py install.
When installing from system repositories, like sudo apt install python-requests.

Cases when it might not work:

When installing in development mode, like python setup.py develop.
When installing in development mode, like pip install -e /path/to/package/source/.

Old answer

A better way of doing this is:

import pip
installed_packages = pip.get_installed_distributions()

For pip>=10.x use:

from pip._internal.utils.misc import get_installed_distributions

Why this way? Sometimes you have app name collisions. Importing from the app namespace doesn’t give you the full picture of what’s installed on the system.

As a result, you get a list of pkg_resources.Distribution objects. See the following as an example:

print installed_packages
[
    "Django 1.6.4 (/path-to-your-env/lib/python2.7/site-packages)",
    "six 1.6.1 (/path-to-your-env/lib/python2.7/site-packages)",
    "requests 2.5.0 (/path-to-your-env/lib/python2.7/site-packages)",
]

Make a list of it:

flat_installed_packages = [package.project_name for package in installed_packages]

[
    "Django",
    "six",
    "requests",
]

Check if requests is installed:

if 'requests' in flat_installed_packages:
    # Do something

Question 58

As of Python 3.3, you can use the find_spec() method

import importlib.util

# For illustrative purposes.
package_name = 'pandas'

spec = importlib.util.find_spec(package_name)
if spec is None:
    print(package_name +" is not installed")

Question 59

If you want to have the check from the terminal, you can run

pip3 show package_name

and if nothing is returned, the package is not installed.

If perhaps you want to automate this check, so that for example you can install it if missing, you can have the following in your bash script:

pip3 show package_name 1>/dev/null #pip for Python 2
if [ $? == 0 ]; then
   echo "Installed" #Replace with your actions
else
   echo "Not Installed" #Replace with your actions, 'pip3 install --upgrade package_name' ?
fi

Question 60

As an extension of this answer:

For Python 2.*, pip show <package_name> will perform the same task.

For example pip show numpy will return the following or alike:

Name: numpy
Version: 1.11.1
Summary: NumPy: array processing for numbers, strings, records, and objects.
Home-page: http://www.numpy.org
Author: NumPy Developers
Author-email: numpy-discussion@scipy.org
License: BSD
Location: /home/***/anaconda2/lib/python2.7/site-packages
Requires: 
Required-by: smop, pandas, tables, spectrum, seaborn, patsy, odo, numpy-stl, numba, nfft, netCDF4, MDAnalysis, matplotlib, h5py, GridDataFormats, dynd, datashape, Bottleneck, blaze, astropy

Question 61

You can use the pkg_resources module from setuptools. For example:

import pkg_resources

package_name = 'cool_package'
try:
    cool_package_dist_info = pkg_resources.get_distribution(package_name)
except pkg_resources.DistributionNotFound:
    print('{} not installed'.format(package_name))
else:
    print(cool_package_dist_info)

Note that there is a difference between python module and a python package. A package can contain multiple modules and module’s names might not match the package name.

Question 62

Open your command prompt type

pip3 list

Question 63

I’d like to add some thoughts/findings of mine to this topic. I’m writing a script that checks all requirements for a custom made program. There are many checks with python modules too.

There’s a little issue with the

try:
   import ..
except:
   ..

solution. In my case one of the python modules called python-nmap, but you import it with import nmap and as you see the names mismatch. Therefore the test with the above solution returns a False result, and it also imports the module on hit, but maybe no need to use a lot of memory for a simple test/check.

I also found that

import pip
installed_packages = pip.get_installed_distributions()

installed_packages will have only the packages has been installed with pip. On my system pip freeze returns over 40 python modules, while installed_packages has only 1, the one I installed manually (python-nmap).

Another solution below that I know it may not relevant to the question, but I think it’s a good practice to keep the test function separate from the one that performs the install it might be useful for some.

The solution that worked for me. It based on this answer How to check if a python module exists without importing it

from imp import find_module

def checkPythonmod(mod):
    try:
        op = find_module(mod)
        return True
    except ImportError:
        return False

NOTE: this solution can’t find the module by the name python-nmap too, I have to use nmap instead (easy to live with) but in this case the module won’t be loaded to the memory whatsoever.

Question 64

If you’d like your script to install missing packages and continue, you could do something like this (on example of ‘krbV’ module in ‘python-krbV’ package):

import pip
import sys

for m, pkg in [('krbV', 'python-krbV')]:
    try:
        setattr(sys.modules[__name__], m, __import__(m))
    except ImportError:
        pip.main(['install', pkg])
        setattr(sys.modules[__name__], m, __import__(m))

问题：Java或Python用于自然语言处理

回答 0

回答 1

问题：如何配置Django以进行简单的开发和部署？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

快速开始

Quickstart

回答 13

问题：Anaconda vs. EPD Enthought vs.手动安装Python [关闭]

回答 0

回答 1

回答 2

回答 3

问题：如何使用pip在Windows上安装PyQt4？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

对于Windows：

For Windows:

回答 9

回答 10

回答 11

回答 12

问题：将字符串转换为DataFrame中的float

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何从命令行编译Visual Studio项目？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：将文本放在matplotlib图的左上角

回答 0

回答 1

问题：检查是否安装了Python软件包

回答 0

Python 3.3+使用sys.modules和find_spec：

Python 3：

Python 2：

Python 3.3+ use sys.modules and find_spec:

Python 3:

Python 2:

回答 1

更新的答案

旧答案

Updated answer

Old answer

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10