标签归档:anaconda

如何使用conda升级到Python 3.6?

问题:如何使用conda升级到Python 3.6?

我是Conda软件包管理的新手,我想获取最新版本的Python以在代码中使用f字符串。目前,我的版本是(python -V):

Python 3.5.2 :: Anaconda 4.2.0 (x86_64)

如何升级到Python 3.6?

I’m new to Conda package management and I want to get the latest version of Python to use f-strings in my code. Currently my version is (python -V):

Python 3.5.2 :: Anaconda 4.2.0 (x86_64)

How would I upgrade to Python 3.6?


回答 0

Anaconda尚未将python内部更新为3.6。

a)方法1

  1. 如果要更新,请输入 conda update python
  2. 更新anaconda类型 conda update anaconda
  3. 如果要在主要的python版本(例如3.5到3.6)之间升级,则必须

    conda install python=$pythonversion$

b)方法2-创建一个新环境(更好的方法)

conda create --name py36 python=3.6

c)要获取绝对最新的python(在撰写本文时为3.6.5)

conda create --name py365 python=3.6.5 --channel conda-forge

您可以从这里看到所有这些

另外,请参阅此以进行强制升级

编辑:Anaconda现在在这里具有Python 3.6版本

Anaconda has not updated python internally to 3.6.

a) Method 1

  1. If you wanted to update you will type conda update python
  2. To update anaconda type conda update anaconda
  3. If you want to upgrade between major python version like 3.5 to 3.6, you’ll have to do

    conda install python=$pythonversion$
    

b) Method 2 – Create a new environment (Better Method)

conda create --name py36 python=3.6

c) To get the absolute latest python(3.6.5 at time of writing)

conda create --name py365 python=3.6.5 --channel conda-forge

You can see all this from here

Also, refer to this for force upgrading

EDIT: Anaconda now has a Python 3.6 version here


回答 1

创建一个新环境将安装python 3.6:

$ conda create --name 3point6 python=3.6
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /Users/dstansby/miniconda3/envs/3point6:

The following NEW packages will be INSTALLED:

    openssl:    1.0.2j-0     
    pip:        9.0.1-py36_1 
    python:     3.6.0-0      
    readline:   6.2-2        
    setuptools: 27.2.0-py36_0
    sqlite:     3.13.0-0     
    tk:         8.5.18-0     
    wheel:      0.29.0-py36_0
    xz:         5.2.2-1      
    zlib:       1.2.8-3 

Creating a new environment will install python 3.6:

$ conda create --name 3point6 python=3.6
Fetching package metadata .......
Solving package specifications: ..........

Package plan for installation in environment /Users/dstansby/miniconda3/envs/3point6:

The following NEW packages will be INSTALLED:

    openssl:    1.0.2j-0     
    pip:        9.0.1-py36_1 
    python:     3.6.0-0      
    readline:   6.2-2        
    setuptools: 27.2.0-py36_0
    sqlite:     3.13.0-0     
    tk:         8.5.18-0     
    wheel:      0.29.0-py36_0
    xz:         5.2.2-1      
    zlib:       1.2.8-3 

回答 2

我在此页面上找到了有关将Anaconda升级到Python的主要更新版本(从Anaconda 4.0+)的详细说明。第一,

conda update conda
conda remove argcomplete conda-manager

我还需要conda remove一些不在官方清单中的软件包:

  • backports_abc
  • 美丽的汤
  • 火焰芯

根据系统上安装的软件包,您可能会遇到其他UnsatisfiableError错误-只需将这些软件包添加到删除列表中即可。接下来,安装Python版本,

conda install python==3.6

这需要一段时间,之后显示消息给conda install anaconda-client,所以我做了

conda install anaconda-client

说它已经在那里。最后,按照指示进行

conda update anaconda

我是在Windows 10命令提示符下执行此操作的,但在Mac OS X中应该与此类似。

I found this page with detailed instructions to upgrade Anaconda to a major newer version of Python (from Anaconda 4.0+). First,

conda update conda
conda remove argcomplete conda-manager

I also had to conda remove some packages not on the official list:

  • backports_abc
  • beautiful-soup
  • blaze-core

Depending on packages installed on your system, you may get additional UnsatisfiableError errors – simply add those packages to the remove list. Next, install the version of Python,

conda install python==3.6

which takes a while, after which a message indicated to conda install anaconda-client, so I did

conda install anaconda-client

which said it’s already there. Finally, following the directions,

conda update anaconda

I did this in the Windows 10 command prompt, but things should be similar in Mac OS X.


回答 3

过去,我发现尝试就地升级非常困难。

注意:我对Anaconda的用例是作为一个多合一的Python环境。我不用理会单独的虚拟环境。如果您conda用于创建环境,这可能具有破坏性,因为conda创建的Anaconda/envs目录中包含硬链接的环境。

因此,如果您使用环境,则可能首先要导出环境。激活环境后,请执行以下操作:

conda env export > environment.yml

备份环境后(如有必要),您可以删除旧的Anaconda(卸载Anaconda非常简单):

$ rm -rf ~/anaconda3/

并通过下载新的Anaconda(例如64位Linux)来替换它:

$ cd ~/Downloads
$ wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh 

有关最新信息请参见此处),

然后执行它:

$ bash Anaconda3-4.3.0-Linux-x86_64.sh 

In the past, I have found it quite difficult to try to upgrade in-place.

Note: my use-case for Anaconda is as an all-in-one Python environment. I don’t bother with separate virtual environments. If you’re using conda to create environments, this may be destructive because conda creates environments with hard-links inside your Anaconda/envs directory.

So if you use environments, you may first want to export your environments. After activating your environment, do something like:

conda env export > environment.yml

After backing up your environments (if necessary), you may remove your old Anaconda (it’s very simple to uninstall Anaconda):

$ rm -rf ~/anaconda3/

and replace it by downloading the new Anaconda, e.g. Linux, 64 bit:

$ cd ~/Downloads
$ wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh 

(see here for a more recent one),

and then executing it:

$ bash Anaconda3-4.3.0-Linux-x86_64.sh 

回答 4

我正在使用Mac OS Mojave

这四个步骤对我有用。

  1. conda update conda
  2. conda install python=3.6
  3. conda install anaconda-client
  4. conda update anaconda

I’m using a Mac OS Mojave

These 4 steps worked for me.

  1. conda update conda
  2. conda install python=3.6
  3. conda install anaconda-client
  4. conda update anaconda

回答 5

我发现的最佳方法:

source activate old_env
conda env export > old_env.yml

然后使用以下方法进行处理:

with open('old_env.yml', 'r') as fin, open('new_env.yml', 'w') as fout:
    for line in fin:
        if 'py35' in line:  # replace by the version you want to supersede
            line = line[:line.rfind('=')] + '\n'
        fout.write(line)

然后手动编辑第一行(name: ...)和最后一行(prefix: ...)以反映您的新环境名称并运行:

conda env create -f new_env.yml

您可能需要手动删除或更改一些软件包的版本标记,而对于这些软件包,固定的版本old_env与新python版本不兼容或丢失。

我希望有一个内置的,更简单的方法…

Best method I found:

source activate old_env
conda env export > old_env.yml

Then process it with something like this:

with open('old_env.yml', 'r') as fin, open('new_env.yml', 'w') as fout:
    for line in fin:
        if 'py35' in line:  # replace by the version you want to supersede
            line = line[:line.rfind('=')] + '\n'
        fout.write(line)

then edit manually the first (name: ...) and last line (prefix: ...) to reflect your new environment name and run:

conda env create -f new_env.yml

you might need to remove or change manually the version pin of a few packages for which which the pinned version from old_env is found incompatible or missing for the new python version.

I wish there was a built-in, easier way…


如何在Mac OS上安装2个Anacondas(Python 2和3)

问题:如何在Mac OS上安装2个Anacondas(Python 2和3)

我在Mac OS中比较新。我刚刚使用最新的Python 3(针对我自己)安装了XCode(针对c ++编译器)和Anaconda。现在我想知道如何使用Python 2正确安装第二个Anaconda(用于工作)?

我需要两个版本都可以与iPython和Spyder IDE一起使用。理想的方法是拥有完全独立的Python环境。例如,我希望我可以conda install scikit-learn为Python 3环境编写类似的东西,而conda2 install scikit-learn为Python 2 环境编写类似的东西。

I’m relatively new in Mac OS. I’ve just installed XCode (for c++ compiler) and Anaconda with the latest Python 3 (for myself). Now I’m wondering how to install properly second Anaconda (for work) with Python 2?

I need both versions to work with iPython and Spyder IDE. Ideal way is to have totally separate Python environments. For example, I wish I could write like conda install scikit-learn for Python 3 environment and something like conda2 install scikit-learn for Python 2.


回答 0

无需再次安装Anaconda。Anaconda的软件包管理器Conda完全支持分离的环境。为Python 2.7创建环境的最简单方法是

conda create -n python2 python=2.7 anaconda

这将创建一个名为python2Python Anaconda的环境。您可以使用

source activate python2

这会将那个环境(通常是~/anaconda/envs/python2)放在您的前面PATH,这样当您python在终端上键入内容时,它将从该环境中加载Python。

如果您不希望使用Anaconda的全部功能,则可以anaconda在上面的命令中将其替换为所需的任何软件包。您可以conda稍后使用-n python2标记conda或激活环境,以在该环境中安装软件包。

There is no need to install Anaconda again. Conda, the package manager for Anaconda, fully supports separated environments. The easiest way to create an environment for Python 2.7 is to do

conda create -n python2 python=2.7 anaconda

This will create an environment named python2 that contains the Python 2.7 version of Anaconda. You can activate this environment with

source activate python2

This will put that environment (typically ~/anaconda/envs/python2) in front in your PATH, so that when you type python at the terminal it will load the Python from that environment.

If you don’t want all of Anaconda, you can replace anaconda in the command above with whatever packages you want. You can use conda to install packages in that environment later, either by using the -n python2 flag to conda, or by activating the environment.


回答 1

编辑!:请确保您在计算机上同时安装了两个Python。

也许我的答案对您来说太迟了,但我可以帮助遇到同样问题的人!

您不必同时下载两者Anaconda

如果你正在使用SpyderJupyterAnaconda的环境下和,

如果您已经有Anaconda 2输入终端:

    python3 -m pip install ipykernel

    python3 -m ipykernel install --user

如果您已经有Anaconda 3,则输入终端:

    python2 -m pip install ipykernel

    python2 -m ipykernel install --user

然后在使用之前,Spyder您可以选择如下所示的Python环境!有时只有您可以看到root和新的Python环境,因此root是您的第一个anaconda环境!

这也是Jupyter。您可以选择像这样的python版本!

希望对您有所帮助。

Edit!: Please be sure that you should have both Python installed on your computer.

Maybe my answer is late for you but I can help someone who has the same problem!

You don’t have to download both Anaconda.

If you are using Spyder and Jupyter in Anaconda environmen and,

If you have already Anaconda 2 type in Terminal:

    python3 -m pip install ipykernel

    python3 -m ipykernel install --user

If you have already Anaconda 3 then type in terminal:

    python2 -m pip install ipykernel

    python2 -m ipykernel install --user

Then before use Spyder you can choose Python environment like below! Sometimes only you can see root and your new Python environment, so root is your first anaconda environment!

Also this is Jupyter. You can choose python version like this!

I hope it will help.


回答 2

如果您安装了多个python版本并且不知道如何告诉您的助手使用特定版本,这可能会有所帮助。

  1. 安装anaconda。最新版本可以在这里找到
  2. 通过输入anaconda-navigator终端打开导航器
  3. 开放环境。点击create,然后在其中选择您的python版本。
  4. 现在将为您的python版本创建新的环境,您只需单击即可安装IDE(在此处列出)install
  5. 在您的环境中启动IDE,以便该IDE将在该环境中使用指定的版本。

希望能帮助到你!!

This may be helpful if you have more than one python versions installed and dont know how to tell your ide’s to use a specific version.

  1. Install anaconda. Latest version can be found here
  2. Open the navigator by typing anaconda-navigator in terminal
  3. Open environments. Click on create and then choose your python version in that.
  4. Now new environment will be created for your python version and you can install the IDE’s(which are listed there) just by clicking install in that.
  5. Launch the IDE in your environment so that that IDE will use the specified version for that environment.

Hope it helps!!


如何从macOS完全卸载Anaconda

问题:如何从macOS完全卸载Anaconda

如何从MacOS Sierra完全卸载Anaconda并恢复为原始Python?我试过使用,conda-clean -yes但不起作用。我也删除了其中的内容,~/.bash_profile但是它仍然使用Anaconda python,并且我仍然可以运行conda命令。

How can I completely uninstall Anaconda from MacOS Sierra and revert back to the original Python? I have tried using conda-clean -yes but that doesn’t work. I also remove the stuff in ~/.bash_profile but it still uses the Anaconda python and I can still run the conda command.


回答 0

删除配置:

conda install anaconda-clean
anaconda-clean --yes

删除配置后,您可以删除anaconda安装文件夹,该文件夹通常位于主目录下:

rm -rf ~/anaconda3

另外,该anaconda-clean --yes命令还会在您的主目录中以格式创建备份~/.anaconda_backup/<timestamp>。确保也删除该一个。


编辑(v5.2.0):现在,如果您要清除所有内容,则还必须删除添加到的最后两行.bash_profile。他们看着像是:

# added by Anaconda3 5.2.0 installer
export PATH="/Users/ody/anaconda3/bin:$PATH"

To remove the configs:

conda install anaconda-clean
anaconda-clean --yes

Once the configs are removed you can delete the anaconda install folder, which is usually under your home dir:

rm -rf ~/anaconda3

Also, the anaconda-clean --yes command creates a backup in your home directory of the format ~/.anaconda_backup/<timestamp>. Make sure to delete that one also.


EDIT (v5.2.0): Now if you want to clean all, you will also have to delete the two last lines added to your .bash_profile. They look like:

# added by Anaconda3 5.2.0 installer
export PATH="/Users/ody/anaconda3/bin:$PATH"

回答 1

要卸载Anaconda,请打开终端窗口:

  1. 删除整个anaconda安装目录:
rm -rf ~/anaconda
  1. 编辑~/.bash_profile 并从您的PATH环境变量中删除anaconda目录。

注意:您可能需要编辑.bashrc和/或.profile文件而不是.bash_profile

  1. 删除以下隐藏的文件和目录,这些文件和目录可能是在主目录中创建的:

    • .condarc
    • .conda
    • .continuum

用:

rm -rf ~/.condarc ~/.conda ~/.continuum

To uninstall Anaconda open a terminal window:

  1. Remove the entire anaconda installation directory:
rm -rf ~/anaconda
  1. Edit ~/.bash_profile and remove the anaconda directory from your PATH environment variable.

Note: You may need to edit .bashrc and/or .profile files instead of .bash_profile

  1. Remove the following hidden files and directories, which may have been created in the home directory:

    • .condarc
    • .conda
    • .continuum

Use:

rm -rf ~/.condarc ~/.conda ~/.continuum

回答 2

就我而言(Mac High Sierra),它安装在〜/ opt / anaconda3上。

https://docs.anaconda.com/anaconda/install/uninstall/

In my case (Mac High Sierra) it was installed at ~/opt/anaconda3.

https://docs.anaconda.com/anaconda/install/uninstall/


回答 3

打开终端,并输入以下命令,删除整个Anaconda目录,该目录的名称将为“ anaconda2”或“ anaconda3”,例如:rm -rf〜/ anaconda3。然后使用命令“ conda uninstall” https://conda.io/docs/commands/conda-uninstall.html删除conda 。

Open the terminal and remove your entire Anaconda directory, which will have a name such as “anaconda2” or “anaconda3”, by entering the following command: rm -rf ~/anaconda3. Then remove conda with command “conda uninstall” https://conda.io/docs/commands/conda-uninstall.html.


回答 4

这是anaconda在删除Anaconda之后有一个条目破坏了我的python安装的地方。希望这对其他人有帮助。

如果您使用的是纱,我在〜/“用户名”的.yarn.rc文件中找到了此条目

python“ / Users / someone / anaconda3 / bin / python3”

删除此行固定了彻底删除所需的最后一个位置。我不确定如何添加该条目,但它有帮助

This is one more place that anaconda had an entry that was breaking my python install after removing Anaconda. Hoping this helps someone else.

If you are using yarn, I found this entry in my .yarn.rc file in ~/”username”

python “/Users/someone/anaconda3/bin/python3”

removing this line fixed one last place needed for complete removal. I am not sure how that entry was added but it helped


回答 5

在执行了辣木和jkysam的非常有用的建议而没有立即获得成功后,需要简单地重新启动Mac才能使系统识别出更改。希望这对某人有帮助!

After performing the very helpful suggestions from both spicyramen & jkysam without immediate success, a simple restart of my Mac was needed to make the system recognize the changes. Hope this helps someone!


回答 6

这对我有用:

conda remove --all --prefix /Users/username/anaconda/bin/python

然后从.bash_profile中的$ PATH中删除

This has worked for me:

conda remove --all --prefix /Users/username/anaconda/bin/python

then also remove from $PATH in .bash_profile


回答 7

在我的〜/ .bash_profile文件中添加export PATH="/Users/<username>/anaconda/bin:$PATH"(或export PATH="/Users/<username>/anaconda3/bin:$PATH"如果您有anaconda 3),可以为我解决此问题。

Adding export PATH="/Users/<username>/anaconda/bin:$PATH" (or export PATH="/Users/<username>/anaconda3/bin:$PATH" if you have anaconda 3) to my ~/.bash_profile file, fixed this issue for me.


回答 8

官方说明似乎在这里:https : //docs.anaconda.com/anaconda/install/uninstall/

但是,如果您喜欢我,由于某种原因而无法使用,并且由于某种原因您的conda却安装在其他地方,并告诉您这样做:

rm -rf ~/opt

我不知道为什么将它保存在那里,但这就是我的目的。


这对我修复conda安装很有用(如果这是您像我这样首先卸载它的原因):https : //stackoverflow.com/a/60902863/1601580最后为我修复了它。不知道为什么conda首先表现得很怪异,或者为什么错误地首先把东西安装了……

The official instructions seem to be here: https://docs.anaconda.com/anaconda/install/uninstall/

but if you like me that didn’t work for some reason and for some reason your conda was installed somewhere else with telling you do this:

rm -rf ~/opt

I have no idea why it was saved there but that’s what did it for me.


This was useful to me in fixing my conda installation (if that is the reason you are uninstalling it in the first place like me): https://stackoverflow.com/a/60902863/1601580 that ended up fixing it for me. Not sure why conda was acting weird in the first place or installing things wrongly in the first place though…


如何使用.yml文件更新现有的Conda环境

问题:如何使用.yml文件更新现有的Conda环境

如何用另一个.yml文件更新先前的conda环境。在具有多个需求文件(例如)的项目上工作时,这非常有用base.yml, local.yml, production.yml

例如,下面是一个base.yml包含conda-forge,conda和pip软件包的文件:

碱基

name: myenv
channels:
  - conda-forge
dependencies:
  - django=1.10.5
  - pip:
    - django-crispy-forms==1.6.1

实际环境是使用创建的 conda env create -f base.yml

稍后,需要将其他软件包添加到中base.yml。另一个文件,例如local.yml,需要导入这些更新。

先前完成此任务的尝试包括:

创建local.yml具有导入定义的文件:

channels:

dependencies:
  - pip:
    - boto3==1.4.4
imports:
  - requirements/base. 

然后运行命令: conda install -f local.yml

这是行不通的。有什么想法吗?

How can a pre-existing conda environment be updated with another .yml file. This is extremely helpful when working on projects that have multiple requirement files, i.e. base.yml, local.yml, production.yml, etc.

For example, below is a base.yml file has conda-forge, conda, and pip packages:

base.yml

name: myenv
channels:
  - conda-forge
dependencies:
  - django=1.10.5
  - pip:
    - django-crispy-forms==1.6.1

The actual environment is created with: conda env create -f base.yml.

Later on, additional packages need to be added to base.yml. Another file, say local.yml, needs to import those updates.

Previous attempts to accomplish this include:

creating a local.yml file with an import definition:

channels:

dependencies:
  - pip:
    - boto3==1.4.4
imports:
  - requirements/base. 

And then run the command: conda install -f local.yml.

This does not work. Any thoughts?


回答 0

尝试使用conda env update

conda activate myenv
conda env update --file local.yml

或无需激活环境(感谢@NumesSanguis):

conda env update --name myenv --file local.yml

Try using conda env update:

conda activate myenv
conda env update --file local.yml

Or without the need to activate the environment (thanks @NumesSanguis):

conda env update --name myenv --file local.yml

回答 1

建议的答案部分正确。您需要添加–prune选项,以卸载从environment.yml中删除的软件包。正确的命令:

conda env update -f local.yml --prune

The suggested answer is partially correct. You’ll need to add the –prune option to also uninstall packages that were removed from the environment.yml. Correct command:

conda env update -f local.yml --prune

回答 2

alkamid的答案是正确的,但是我发现如果环境已经处于活动状态,则Conda无法安装新的依赖项。停用环境首先可以解决此问题:

source deactivate;
conda env update -f whatever.yml;
source activate my_environment_name; # Must be AFTER the conda env update line!

alkamid’s answer is on the right lines, but I have found that Conda fails to install new dependencies if the environment is already active. Deactivating the environment first resolves this:

source deactivate;
conda env update -f whatever.yml;
source activate my_environment_name; # Must be AFTER the conda env update line!

如何在Python中进行热编码?

问题:如何在Python中进行热编码?

我有一个80%分类变量的机器学习分类问题。如果要使用一些分类器进行分类,是否必须使用一种热编码?我可以在没有编码的情况下将数据传递给分类器吗?

我正在尝试进行以下功能选择:

  1. 我读了火车文件:

    num_rows_to_read = 10000
    train_small = pd.read_csv("../../dataset/train.csv",   nrows=num_rows_to_read)
    
  2. 我将类别特征的类型更改为“类别”:

    non_categorial_features = ['orig_destination_distance',
                              'srch_adults_cnt',
                              'srch_children_cnt',
                              'srch_rm_cnt',
                              'cnt']
    
    for categorical_feature in list(train_small.columns):
        if categorical_feature not in non_categorial_features:
            train_small[categorical_feature] = train_small[categorical_feature].astype('category')
    
  3. 我使用一种热编码:

    train_small_with_dummies = pd.get_dummies(train_small, sparse=True)

问题是,尽管我使用的是坚固的机器,但第3部分经常卡住。

因此,没有一种热编码,我就无法进行任何特征选择来确定特征的重要性。

您有什么推荐的吗?

I have a machine learning classification problem with 80% categorical variables. Must I use one hot encoding if I want to use some classifier for the classification? Can i pass the data to a classifier without the encoding?

I am trying to do the following for feature selection:

  1. I read the train file:

    num_rows_to_read = 10000
    train_small = pd.read_csv("../../dataset/train.csv",   nrows=num_rows_to_read)
    
  2. I change the type of the categorical features to ‘category’:

    non_categorial_features = ['orig_destination_distance',
                              'srch_adults_cnt',
                              'srch_children_cnt',
                              'srch_rm_cnt',
                              'cnt']
    
    for categorical_feature in list(train_small.columns):
        if categorical_feature not in non_categorial_features:
            train_small[categorical_feature] = train_small[categorical_feature].astype('category')
    
  3. I use one hot encoding:

    train_small_with_dummies = pd.get_dummies(train_small, sparse=True)
    

The problem is that the 3’rd part often get stuck, although I am using a strong machine.

Thus, without the one hot encoding I can’t do any feature selection, for determining the importance of the features.

What do you recommend?


回答 0

方法1:您可以在pandas数据框上使用get_dummies。

范例1:

import pandas as pd
s = pd.Series(list('abca'))
pd.get_dummies(s)
Out[]: 
     a    b    c
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0
3  1.0  0.0  0.0

范例2:

下面将把给定的列转换为一个热门列。使用前缀具有多个虚拟变量。

import pandas as pd

df = pd.DataFrame({
          'A':['a','b','a'],
          'B':['b','a','c']
        })
df
Out[]: 
   A  B
0  a  b
1  b  a
2  a  c

# Get one hot encoding of columns B
one_hot = pd.get_dummies(df['B'])
# Drop column B as it is now encoded
df = df.drop('B',axis = 1)
# Join the encoded df
df = df.join(one_hot)
df  
Out[]: 
       A  a  b  c
    0  a  0  1  0
    1  b  1  0  0
    2  a  0  0  1

方法2:使用Scikit学习

给定一个具有三个特征和四个样本的数据集,我们让编码器找到每个特征的最大值,并将数据转换为二进制的一键编码。

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])   
OneHotEncoder(categorical_features='all', dtype=<class 'numpy.float64'>,
   handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9], dtype=int32)
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

这是此示例的链接:http : //scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Approach 1: You can use pandas’ pd.get_dummies.

Example 1:

import pandas as pd
s = pd.Series(list('abca'))
pd.get_dummies(s)
Out[]: 
     a    b    c
0  1.0  0.0  0.0
1  0.0  1.0  0.0
2  0.0  0.0  1.0
3  1.0  0.0  0.0

Example 2:

The following will transform a given column into one hot. Use prefix to have multiple dummies.

import pandas as pd
        
df = pd.DataFrame({
          'A':['a','b','a'],
          'B':['b','a','c']
        })
df
Out[]: 
   A  B
0  a  b
1  b  a
2  a  c

# Get one hot encoding of columns B
one_hot = pd.get_dummies(df['B'])
# Drop column B as it is now encoded
df = df.drop('B',axis = 1)
# Join the encoded df
df = df.join(one_hot)
df  
Out[]: 
       A  a  b  c
    0  a  0  1  0
    1  b  1  0  0
    2  a  0  0  1

Approach 2: Use Scikit-learn

Using a OneHotEncoder has the advantage of being able to fit on some training data and then transform on some other data using the same instance. We also have handle_unknown to further control what the encoder does with unseen data.

Given a dataset with three features and four samples, we let the encoder find the maximum value per feature and transform the data to a binary one-hot encoding.

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])   
OneHotEncoder(categorical_features='all', dtype=<class 'numpy.float64'>,
   handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9], dtype=int32)
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1.,  0.,  0.,  1.,  0.,  0.,  1.,  0.,  0.]])

Here is the link for this example: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html


回答 1

使用Pandas进行基本的一键编码要容易得多。如果您正在寻找更多选项,可以使用scikit-learn

对于使用Pandas的基本一键编码,您只需将数据帧传递到get_dummies函数中。

例如,如果我有一个名为imdb_movies的数据

…并且我想对“额定值”列进行一次热编码,我只需要这样做:

pd.get_dummies(imdb_movies.Rated)

dataframe将为存在的每个“ 等级 ” 返回一个新的带有列的列,以及一个1或0,用于指定给定观察值的等级。

通常,我们希望将其作为原始文档的一部分dataframe。在这种情况下,我们只需使用“ column-binding ”将新的伪编码帧附加到原始帧即可。

我们可以使用Pandas concat函数进行列绑定:

rated_dummies = pd.get_dummies(imdb_movies.Rated)
pd.concat([imdb_movies, rated_dummies], axis=1)

现在,我们可以对全部数据进行分析dataframe

简单的功能

我建议您使自己成为实用工具,以快速完成此任务:

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    return(res)

用法

encode_and_bind(imdb_movies, 'Rated')

结果

另外,按照@pmalbu注释,如果您希望该函数删除原始的feature_to_encode,请使用以下版本:

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    res = res.drop([feature_to_encode], axis=1)
    return(res) 

您可以同时对多个功能进行编码,如下所示:

features_to_encode = ['feature_1', 'feature_2', 'feature_3',
                      'feature_4']
for feature in features_to_encode:
    res = encode_and_bind(train_set, feature)

Much easier to use Pandas for basic one-hot encoding. If you’re looking for more options you can use scikit-learn.

For basic one-hot encoding with Pandas you pass your data frame into the get_dummies function.

For example, if I have a dataframe called imdb_movies:

…and I want to one-hot encode the Rated column, I do this:

pd.get_dummies(imdb_movies.Rated)

This returns a new dataframe with a column for every “level” of rating that exists, along with either a 1 or 0 specifying the presence of that rating for a given observation.

Usually, we want this to be part of the original dataframe. In this case, we attach our new dummy coded frame onto the original frame using “column-binding.

We can column-bind by using Pandas concat function:

rated_dummies = pd.get_dummies(imdb_movies.Rated)
pd.concat([imdb_movies, rated_dummies], axis=1)

We can now run an analysis on our full dataframe.

SIMPLE UTILITY FUNCTION

I would recommend making yourself a utility function to do this quickly:

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    return(res)

Usage:

encode_and_bind(imdb_movies, 'Rated')

Result:

Also, as per @pmalbu comment, if you would like the function to remove the original feature_to_encode then use this version:

def encode_and_bind(original_dataframe, feature_to_encode):
    dummies = pd.get_dummies(original_dataframe[[feature_to_encode]])
    res = pd.concat([original_dataframe, dummies], axis=1)
    res = res.drop([feature_to_encode], axis=1)
    return(res) 

You can encode multiple features at the same time as follows:

features_to_encode = ['feature_1', 'feature_2', 'feature_3',
                      'feature_4']
for feature in features_to_encode:
    res = encode_and_bind(train_set, feature)

回答 2

您可以使用numpy.eye和使用数组元素选择机制来做到这一点:

import numpy as np
nb_classes = 6
data = [[2, 3, 4, 0]]

def indices_to_one_hot(data, nb_classes):
    """Convert an iterable of indices to one-hot encoded labels."""
    targets = np.array(data).reshape(-1)
    return np.eye(nb_classes)[targets]

现在的返回值indices_to_one_hot(nb_classes, data)

array([[[ 0.,  0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  0.]]])

使用.reshape(-1)可以确保您使用正确的标签格式(也可能使用[[2], [3], [4], [0]])。

You can do it with numpy.eye and a using the array element selection mechanism:

import numpy as np
nb_classes = 6
data = [[2, 3, 4, 0]]

def indices_to_one_hot(data, nb_classes):
    """Convert an iterable of indices to one-hot encoded labels."""
    targets = np.array(data).reshape(-1)
    return np.eye(nb_classes)[targets]

The the return value of indices_to_one_hot(nb_classes, data) is now

array([[[ 0.,  0.,  1.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.,  0.],
        [ 1.,  0.,  0.,  0.,  0.,  0.]]])

The .reshape(-1) is there to make sure you have the right labels format (you might also have [[2], [3], [4], [0]]).


回答 3

首先,最简单的一种热编码方式是:使用Sklearn。

http://scikit-learn.org/stable/modules/generation/sklearn.preprocessing.OneHotEncoder.html

其次,我不认为使用pandas进行一次热编码就这么简单(不过未经证实)

在Pandas中为Python创建虚拟变量

最后,您是否有必要进行一次热编码?一种热编码以指数方式增加了功能数量,从而极大地增加了任何分类器或将要运行的任何其他对象的运行时间。尤其是当每个分类特征具有多个级别时。相反,您可以进行伪编码。

使用伪编码通常效果很好,运行时间和复杂性要少得多。一位明智的教授曾经告诉我,“少即是多”。

如果需要,这是我的自定义编码功能的代码。

from sklearn.preprocessing import LabelEncoder

#Auto encodes any dataframe column of type category or object.
def dummyEncode(df):
        columnsToEncode = list(df.select_dtypes(include=['category','object']))
        le = LabelEncoder()
        for feature in columnsToEncode:
            try:
                df[feature] = le.fit_transform(df[feature])
            except:
                print('Error encoding '+feature)
        return df

编辑:比较要更清楚:

一键编码:将n个级别转换为n-1列。

Index  Animal         Index  cat  mouse
  1     dog             1     0     0
  2     cat       -->   2     1     0
  3    mouse            3     0     1

如果分类功能中有许多不同的类型(或级别),则可以看到这将如何扩展您的内存。请记住,这只是一栏。

虚拟编码:

Index  Animal         Index  Animal
  1     dog             1      0   
  2     cat       -->   2      1 
  3    mouse            3      2

改为转换为数字表示形式。大大节省了功能空间,但以准确性为代价。

Firstly, easiest way to one hot encode: use Sklearn.

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Secondly, I don’t think using pandas to one hot encode is that simple (unconfirmed though)

Creating dummy variables in pandas for python

Lastly, is it necessary for you to one hot encode? One hot encoding exponentially increases the number of features, drastically increasing the run time of any classifier or anything else you are going to run. Especially when each categorical feature has many levels. Instead you can do dummy coding.

Using dummy encoding usually works well, for much less run time and complexity. A wise prof once told me, ‘Less is More’.

Here’s the code for my custom encoding function if you want.

from sklearn.preprocessing import LabelEncoder

#Auto encodes any dataframe column of type category or object.
def dummyEncode(df):
        columnsToEncode = list(df.select_dtypes(include=['category','object']))
        le = LabelEncoder()
        for feature in columnsToEncode:
            try:
                df[feature] = le.fit_transform(df[feature])
            except:
                print('Error encoding '+feature)
        return df

EDIT: Comparison to be clearer:

One-hot encoding: convert n levels to n-1 columns.

Index  Animal         Index  cat  mouse
  1     dog             1     0     0
  2     cat       -->   2     1     0
  3    mouse            3     0     1

You can see how this will explode your memory if you have many different types (or levels) in your categorical feature. Keep in mind, this is just ONE column.

Dummy Coding:

Index  Animal         Index  Animal
  1     dog             1      0   
  2     cat       -->   2      1 
  3    mouse            3      2

Convert to numerical representations instead. Greatly saves feature space, at the cost of a bit of accuracy.


回答 4

使用熊猫进行热编码非常简单:

def one_hot(df, cols):
    """
    @param df pandas DataFrame
    @param cols a list of columns to encode 
    @return a DataFrame with one-hot encoding
    """
    for each in cols:
        dummies = pd.get_dummies(df[each], prefix=each, drop_first=False)
        df = pd.concat([df, dummies], axis=1)
    return df

编辑:

使用sklearn的另一种方式one_hot LabelBinarizer

from sklearn.preprocessing import LabelBinarizer 
label_binarizer = LabelBinarizer()
label_binarizer.fit(all_your_labels_list) # need to be global or remembered to use it later

def one_hot_encode(x):
    """
    One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
    : x: List of sample Labels
    : return: Numpy array of one-hot encoded labels
    """
    return label_binarizer.transform(x)

One hot encoding with pandas is very easy:

def one_hot(df, cols):
    """
    @param df pandas DataFrame
    @param cols a list of columns to encode 
    @return a DataFrame with one-hot encoding
    """
    for each in cols:
        dummies = pd.get_dummies(df[each], prefix=each, drop_first=False)
        df = pd.concat([df, dummies], axis=1)
    return df

EDIT:

Another way to one_hot using sklearn’s LabelBinarizer :

from sklearn.preprocessing import LabelBinarizer 
label_binarizer = LabelBinarizer()
label_binarizer.fit(all_your_labels_list) # need to be global or remembered to use it later

def one_hot_encode(x):
    """
    One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
    : x: List of sample Labels
    : return: Numpy array of one-hot encoded labels
    """
    return label_binarizer.transform(x)

回答 5

您可以使用numpy.eye函数。

import numpy as np

def one_hot_encode(x, n_classes):
    """
    One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
    : x: List of sample Labels
    : return: Numpy array of one-hot encoded labels
     """
    return np.eye(n_classes)[x]

def main():
    list = [0,1,2,3,4,3,2,1,0]
    n_classes = 5
    one_hot_list = one_hot_encode(list, n_classes)
    print(one_hot_list)

if __name__ == "__main__":
    main()

结果

D:\Desktop>python test.py
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.]]

You can use numpy.eye function.

import numpy as np

def one_hot_encode(x, n_classes):
    """
    One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
    : x: List of sample Labels
    : return: Numpy array of one-hot encoded labels
     """
    return np.eye(n_classes)[x]

def main():
    list = [0,1,2,3,4,3,2,1,0]
    n_classes = 5
    one_hot_list = one_hot_encode(list, n_classes)
    print(one_hot_list)

if __name__ == "__main__":
    main()

Result

D:\Desktop>python test.py
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.]]

回答 6

pandas具有内置功能“ get_dummies”,可以对该特定列进行一次热编码。

一种热编码的行代码:

df=pd.concat([df,pd.get_dummies(df['column name'],prefix='column name')],axis=1).drop(['column name'],axis=1)

pandas as has inbuilt function “get_dummies” to get one hot encoding of that particular column/s.

one line code for one-hot-encoding:

df=pd.concat([df,pd.get_dummies(df['column name'],prefix='column name')],axis=1).drop(['column name'],axis=1)

回答 7

这是使用DictVectorizer和Pandas DataFrame.to_dict('records')方法的解决方案。

>>> import pandas as pd
>>> X = pd.DataFrame({'income': [100000,110000,90000,30000,14000,50000],
                      'country':['US', 'CAN', 'US', 'CAN', 'MEX', 'US'],
                      'race':['White', 'Black', 'Latino', 'White', 'White', 'Black']
                     })

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer()
>>> qualitative_features = ['country','race']
>>> X_qual = v.fit_transform(X[qualitative_features].to_dict('records'))
>>> v.vocabulary_
{'country=CAN': 0,
 'country=MEX': 1,
 'country=US': 2,
 'race=Black': 3,
 'race=Latino': 4,
 'race=White': 5}

>>> X_qual.toarray()
array([[ 0.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.]])

Here is a solution using DictVectorizer and the Pandas DataFrame.to_dict('records') method.

>>> import pandas as pd
>>> X = pd.DataFrame({'income': [100000,110000,90000,30000,14000,50000],
                      'country':['US', 'CAN', 'US', 'CAN', 'MEX', 'US'],
                      'race':['White', 'Black', 'Latino', 'White', 'White', 'Black']
                     })

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer()
>>> qualitative_features = ['country','race']
>>> X_qual = v.fit_transform(X[qualitative_features].to_dict('records'))
>>> v.vocabulary_
{'country=CAN': 0,
 'country=MEX': 1,
 'country=US': 2,
 'race=Black': 3,
 'race=Latino': 4,
 'race=White': 5}

>>> X_qual.toarray()
array([[ 0.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  1.,  0.,  0.]])

回答 8

一键编码比将值转换为指示符变量还需要更多一点。通常,机器学习过程要求您多次将此编码应用于验证或测试数据集,并将构造的模型应用于实时观察到的数据。您应该存储用于构造模型的映射(转换)。一个好的解决方案是使用DictVectorizeror LabelEncoder(后跟get_dummies。这是可以使用的函数:

def oneHotEncode2(df, le_dict = {}):
    if not le_dict:
        columnsToEncode = list(df.select_dtypes(include=['category','object']))
        train = True;
    else:
        columnsToEncode = le_dict.keys()   
        train = False;

    for feature in columnsToEncode:
        if train:
            le_dict[feature] = LabelEncoder()
        try:
            if train:
                df[feature] = le_dict[feature].fit_transform(df[feature])
            else:
                df[feature] = le_dict[feature].transform(df[feature])

            df = pd.concat([df, 
                              pd.get_dummies(df[feature]).rename(columns=lambda x: feature + '_' + str(x))], axis=1)
            df = df.drop(feature, axis=1)
        except:
            print('Error encoding '+feature)
            #df[feature]  = df[feature].convert_objects(convert_numeric='force')
            df[feature]  = df[feature].apply(pd.to_numeric, errors='coerce')
    return (df, le_dict)

这适用于pandas数据框,并为数据框的每一列创建并返回映射。因此,您可以这样称呼它:

train_data, le_dict = oneHotEncode2(train_data)

然后在测试数据上,通过传递训练返回的字典进行调用:

test_data, _ = oneHotEncode2(test_data, le_dict)

等效的方法是使用DictVectorizer。同一篇文章的相关文章在我的博客上。我在这里提到它,是因为它提供了这种方法背后的一些理由,而不仅仅是使用get_dummies 帖子 (公开:这是我自己的博客)。

One-hot encoding requires bit more than converting the values to indicator variables. Typically ML process requires you to apply this coding several times to validation or test data sets and applying the model you construct to real-time observed data. You should store the mapping (transform) that was used to construct the model. A good solution would use the DictVectorizer or LabelEncoder (followed by get_dummies. Here is a function that you can use:

def oneHotEncode2(df, le_dict = {}):
    if not le_dict:
        columnsToEncode = list(df.select_dtypes(include=['category','object']))
        train = True;
    else:
        columnsToEncode = le_dict.keys()   
        train = False;

    for feature in columnsToEncode:
        if train:
            le_dict[feature] = LabelEncoder()
        try:
            if train:
                df[feature] = le_dict[feature].fit_transform(df[feature])
            else:
                df[feature] = le_dict[feature].transform(df[feature])

            df = pd.concat([df, 
                              pd.get_dummies(df[feature]).rename(columns=lambda x: feature + '_' + str(x))], axis=1)
            df = df.drop(feature, axis=1)
        except:
            print('Error encoding '+feature)
            #df[feature]  = df[feature].convert_objects(convert_numeric='force')
            df[feature]  = df[feature].apply(pd.to_numeric, errors='coerce')
    return (df, le_dict)

This works on a pandas dataframe and for each column of the dataframe it creates and returns a mapping back. So you would call it like this:

train_data, le_dict = oneHotEncode2(train_data)

Then on the test data, the call is made by passing the dictionary returned back from training:

test_data, _ = oneHotEncode2(test_data, le_dict)

An equivalent method is to use DictVectorizer. A related post on the same is on my blog. I mention it here since it provides some reasoning behind this approach over simply using get_dummies post (disclosure: this is my own blog).


回答 9

您可以将数据传递给catboost分类器,而无需进行编码。Catboost通过执行一键式和目标扩展均值编码来自身处理分类变量。

You can pass the data to catboost classifier without encoding. Catboost handles categorical variables itself by performing one-hot and target expanding mean encoding.


回答 10

您也可以执行以下操作。请注意以下内容,您不必使用pd.concat

import pandas as pd 
# intialise data of lists. 
data = {'Color':['Red', 'Yellow', 'Red', 'Yellow'], 'Length':[20.1, 21.1, 19.1, 18.1],
       'Group':[1,2,1,2]} 

# Create DataFrame 
df = pd.DataFrame(data) 

for _c in df.select_dtypes(include=['object']).columns:
    print(_c)
    df[_c]  = pd.Categorical(df[_c])
df_transformed = pd.get_dummies(df)
df_transformed

您还可以将显式列更改为分类。例如,在这里我要更改ColorGroup

import pandas as pd 
# intialise data of lists. 
data = {'Color':['Red', 'Yellow', 'Red', 'Yellow'], 'Length':[20.1, 21.1, 19.1, 18.1],
       'Group':[1,2,1,2]} 

# Create DataFrame 
df = pd.DataFrame(data) 
columns_to_change = list(df.select_dtypes(include=['object']).columns)
columns_to_change.append('Group')
for _c in columns_to_change:
    print(_c)
    df[_c]  = pd.Categorical(df[_c])
df_transformed = pd.get_dummies(df)
df_transformed

You can do the following as well. Note for the below you don’t have to use pd.concat.

import pandas as pd 
# intialise data of lists. 
data = {'Color':['Red', 'Yellow', 'Red', 'Yellow'], 'Length':[20.1, 21.1, 19.1, 18.1],
       'Group':[1,2,1,2]} 

# Create DataFrame 
df = pd.DataFrame(data) 

for _c in df.select_dtypes(include=['object']).columns:
    print(_c)
    df[_c]  = pd.Categorical(df[_c])
df_transformed = pd.get_dummies(df)
df_transformed

You can also change explicit columns to categorical. For example, here I am changing the Color and Group

import pandas as pd 
# intialise data of lists. 
data = {'Color':['Red', 'Yellow', 'Red', 'Yellow'], 'Length':[20.1, 21.1, 19.1, 18.1],
       'Group':[1,2,1,2]} 

# Create DataFrame 
df = pd.DataFrame(data) 
columns_to_change = list(df.select_dtypes(include=['object']).columns)
columns_to_change.append('Group')
for _c in columns_to_change:
    print(_c)
    df[_c]  = pd.Categorical(df[_c])
df_transformed = pd.get_dummies(df)
df_transformed

回答 11

我知道我来晚了,但是以自动化方式对数据帧进行热编码的最简单方法是使用此功能:

def hot_encode(df):
    obj_df = df.select_dtypes(include=['object'])
    return pd.get_dummies(df, columns=obj_df.columns).values

I know I’m late to this party, but the simplest way to hot encode a dataframe in an automated way is to use this function:

def hot_encode(df):
    obj_df = df.select_dtypes(include=['object'])
    return pd.get_dummies(df, columns=obj_df.columns).values

回答 12

我在声学模型中使用了它:可能对您的模型有帮助。

def one_hot_encoding(x, n_out):
    x = x.astype(int)  
    shape = x.shape
    x = x.flatten()
    N = len(x)
    x_categ = np.zeros((N,n_out))
    x_categ[np.arange(N), x] = 1
    return x_categ.reshape((shape)+(n_out,))

I used this in my acoustic model: probably this helps in ur model.

def one_hot_encoding(x, n_out):
    x = x.astype(int)  
    shape = x.shape
    x = x.flatten()
    N = len(x)
    x_categ = np.zeros((N,n_out))
    x_categ[np.arange(N), x] = 1
    return x_categ.reshape((shape)+(n_out,))

回答 13

要添加其他问题,让我提供如何使用Numpy使用Python 2.0函数来实现它:

def one_hot(y_):
    # Function to encode output labels from number indexes 
    # e.g.: [[5], [0], [3]] --> [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]

    y_ = y_.reshape(len(y_))
    n_values = np.max(y_) + 1
    return np.eye(n_values)[np.array(y_, dtype=np.int32)]  # Returns FLOATS

该行n_values = np.max(y_) + 1可能经过硬编码,以便在使用迷你批处理的情况下使用大量神经元。

使用此功能的演示项目/教程:https : //github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition

To add to other questions, let me provide how I did it with a Python 2.0 function using Numpy:

def one_hot(y_):
    # Function to encode output labels from number indexes 
    # e.g.: [[5], [0], [3]] --> [[0, 0, 0, 0, 0, 1], [1, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0]]

    y_ = y_.reshape(len(y_))
    n_values = np.max(y_) + 1
    return np.eye(n_values)[np.array(y_, dtype=np.int32)]  # Returns FLOATS

The line n_values = np.max(y_) + 1 could be hard-coded for you to use the good number of neurons in case you use mini-batches for example.

Demo project/tutorial where this function has been used: https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition


回答 14

这对我有用:

pandas.factorize( ['B', 'C', 'D', 'B'] )[0]

输出:

[0, 1, 2, 0]

This works for me:

pandas.factorize( ['B', 'C', 'D', 'B'] )[0]

Output:

[0, 1, 2, 0]

回答 15

它可以并且应该很容易:

class OneHotEncoder:
    def __init__(self,optionKeys):
        length=len(optionKeys)
        self.__dict__={optionKeys[j]:[0 if i!=j else 1 for i in range(length)] for j in range(length)}

用法:

ohe=OneHotEncoder(["A","B","C","D"])
print(ohe.A)
print(ohe.D)

It can and it should be easy as :

class OneHotEncoder:
    def __init__(self,optionKeys):
        length=len(optionKeys)
        self.__dict__={optionKeys[j]:[0 if i!=j else 1 for i in range(length)] for j in range(length)}

Usage :

ohe=OneHotEncoder(["A","B","C","D"])
print(ohe.A)
print(ohe.D)

回答 16

扩展@Martin Thoma的答案

def one_hot_encode(y):
    """Convert an iterable of indices to one-hot encoded labels."""
    y = y.flatten() # Sometimes not flattened vector is passed e.g (118,1) in these cases
    # the function ends up creating a tensor e.g. (118, 2, 1). flatten removes this issue
    nb_classes = len(np.unique(y)) # get the number of unique classes
    standardised_labels = dict(zip(np.unique(y), np.arange(nb_classes))) # get the class labels as a dictionary
    # which then is standardised. E.g imagine class labels are (4,7,9) if a vector of y containing 4,7 and 9 is
    # directly passed then np.eye(nb_classes)[4] or 7,9 throws an out of index error.
    # standardised labels fixes this issue by returning a dictionary;
    # standardised_labels = {4:0, 7:1, 9:2}. The values of the dictionary are mapped to keys in y array.
    # standardised_labels also removes the error that is raised if the labels are floats. E.g. 1.0; element
    # cannot be called by an integer index e.g y[1.0] - throws an index error.
    targets = np.vectorize(standardised_labels.get)(y) # map the dictionary values to array.
    return np.eye(nb_classes)[targets]

Expanding @Martin Thoma’s answer

def one_hot_encode(y):
    """Convert an iterable of indices to one-hot encoded labels."""
    y = y.flatten() # Sometimes not flattened vector is passed e.g (118,1) in these cases
    # the function ends up creating a tensor e.g. (118, 2, 1). flatten removes this issue
    nb_classes = len(np.unique(y)) # get the number of unique classes
    standardised_labels = dict(zip(np.unique(y), np.arange(nb_classes))) # get the class labels as a dictionary
    # which then is standardised. E.g imagine class labels are (4,7,9) if a vector of y containing 4,7 and 9 is
    # directly passed then np.eye(nb_classes)[4] or 7,9 throws an out of index error.
    # standardised labels fixes this issue by returning a dictionary;
    # standardised_labels = {4:0, 7:1, 9:2}. The values of the dictionary are mapped to keys in y array.
    # standardised_labels also removes the error that is raised if the labels are floats. E.g. 1.0; element
    # cannot be called by an integer index e.g y[1.0] - throws an index error.
    targets = np.vectorize(standardised_labels.get)(y) # map the dictionary values to array.
    return np.eye(nb_classes)[targets]

回答 17

简短答案

这是一个无需使用numpy,pandas或其他软件包即可进行一次热编码的函数。它需要一个整数,布尔值或字符串(可能还有其他类型)的列表。

import typing


def one_hot_encode(items: list) -> typing.List[list]:
    results = []
    # find the unique items (we want to unique items b/c duplicate items will have the same encoding)
    unique_items = list(set(items))
    # sort the unique items
    sorted_items = sorted(unique_items)
    # find how long the list of each item should be
    max_index = len(unique_items)

    for item in items:
        # create a list of zeros the appropriate length
        one_hot_encoded_result = [0 for i in range(0, max_index)]
        # find the index of the item
        one_hot_index = sorted_items.index(item)
        # change the zero at the index from the previous line to a one
        one_hot_encoded_result[one_hot_index] = 1
        # add the result
        results.append(one_hot_encoded_result)

    return results

例:

one_hot_encode([2, 1, 1, 2, 5, 3])

# [[0, 1, 0, 0],
#  [1, 0, 0, 0],
#  [1, 0, 0, 0],
#  [0, 1, 0, 0],
#  [0, 0, 0, 1],
#  [0, 0, 1, 0]]
one_hot_encode([True, False, True])

# [[0, 1], [1, 0], [0, 1]]
one_hot_encode(['a', 'b', 'c', 'a', 'e'])

# [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 0, 1]]

长(长)答案

我知道这个问题已经有很多答案了,但是我注意到了两点。首先,大多数答案都使用numpy和/或pandas之类的软件包。这是一件好事。如果要编写生产代码,则可能应该使用健壮,快速的算法,例如numpy / pandas软件包中提供的算法。但是,出于教育的目的,我认为应该提供一个答案,该答案具有透明的算法,而不仅仅是其他人算法的实现。其次,我注意到许多答案没有提供可靠的一键编码实现,因为它们不满足以下要求之一。以下是一些有用,准确且健壮的一键编码功能的要求(如我所见):

一键编码功能必须:

  • 处理各种类型的列表(例如,整数,字符串,浮点数等)作为输入
  • 处理重复的输入列表
  • 返回与输入相对应的列表列表(顺序相同)
  • 返回列表列表,其中每个列表都尽可能短

我测试了这个问题的许多答案,但大多数都无法满足上述要求之一。

Short Answer

Here is a function to do one-hot-encoding without using numpy, pandas, or other packages. It takes a list of integers, booleans, or strings (and perhaps other types too).

import typing


def one_hot_encode(items: list) -> typing.List[list]:
    results = []
    # find the unique items (we want to unique items b/c duplicate items will have the same encoding)
    unique_items = list(set(items))
    # sort the unique items
    sorted_items = sorted(unique_items)
    # find how long the list of each item should be
    max_index = len(unique_items)

    for item in items:
        # create a list of zeros the appropriate length
        one_hot_encoded_result = [0 for i in range(0, max_index)]
        # find the index of the item
        one_hot_index = sorted_items.index(item)
        # change the zero at the index from the previous line to a one
        one_hot_encoded_result[one_hot_index] = 1
        # add the result
        results.append(one_hot_encoded_result)

    return results

Example:

one_hot_encode([2, 1, 1, 2, 5, 3])

# [[0, 1, 0, 0],
#  [1, 0, 0, 0],
#  [1, 0, 0, 0],
#  [0, 1, 0, 0],
#  [0, 0, 0, 1],
#  [0, 0, 1, 0]]
one_hot_encode([True, False, True])

# [[0, 1], [1, 0], [0, 1]]
one_hot_encode(['a', 'b', 'c', 'a', 'e'])

# [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 0, 1]]

Long(er) Answer

I know there are already a lot of answers to this question, but I noticed two things. First, most of the answers use packages like numpy and/or pandas. And this is a good thing. If you are writing production code, you should probably be using robust, fast algorithms like those provided in the numpy/pandas packages. But, for the sake of education, I think someone should provide an answer which has a transparent algorithm and not just an implementation of someone else’s algorithm. Second, I noticed that many of the answers do not provide a robust implementation of one-hot encoding because they do not meet one of the requirements below. Below are some of the requirements (as I see them) for a useful, accurate, and robust one-hot encoding function:

A one-hot encoding function must:

  • handle list of various types (e.g. integers, strings, floats, etc.) as input
  • handle an input list with duplicates
  • return a list of lists corresponding (in the same order as) to the inputs
  • return a list of lists where each list is as short as possible

I tested many of the answers to this question and most of them fail on one of the requirements above.


回答 18

试试这个:

!pip install category_encoders
import category_encoders as ce

categorical_columns = [...the list of names of the columns you want to one-hot-encode ...]
encoder = ce.OneHotEncoder(cols=categorical_columns, use_cat_names=True)
df_train_encoded = encoder.fit_transform(df_train_small)

df_encoded.head()

生成的数据框df_train_encoded与原始数据框相同,但是现在将分类功能替换为它们的一键编码版本。

有关更多信息,请category_encoders 参见此处

Try this:

!pip install category_encoders
import category_encoders as ce

categorical_columns = [...the list of names of the columns you want to one-hot-encode ...]
encoder = ce.OneHotEncoder(cols=categorical_columns, use_cat_names=True)
df_train_encoded = encoder.fit_transform(df_train_small)

df_encoded.head()

The resulting dataframe df_train_encoded is the same as the original, but the categorical features are now replaced with their one-hot-encoded versions.

More information on category_encoders here.


回答 19

在这里,我尝试了这种方法:

import numpy as np
#converting to one_hot





def one_hot_encoder(value, datal):

    datal[value] = 1

    return datal


def _one_hot_values(labels_data):
    encoded = [0] * len(labels_data)

    for j, i in enumerate(labels_data):
        max_value = [0] * (np.max(labels_data) + 1)

        encoded[j] = one_hot_encoder(i, max_value)

    return np.array(encoded)

Here i tried with this approach :

import numpy as np
#converting to one_hot





def one_hot_encoder(value, datal):

    datal[value] = 1

    return datal


def _one_hot_values(labels_data):
    encoded = [0] * len(labels_data)

    for j, i in enumerate(labels_data):
        max_value = [0] * (np.max(labels_data) + 1)

        encoded[j] = one_hot_encoder(i, max_value)

    return np.array(encoded)

如何恢复到Anaconda中的先前软件包?

问题:如何恢复到Anaconda中的先前软件包?

如果我做

conda info pandas

我可以看到所有可用的软件包。

pandas今天上午将其更新为最新版本,但是现在我需要恢复到以前的版本。我试过了

conda update pandas 0.13.1

但这没用。如何指定要使用的版本?

If I do

conda info pandas

I can see all of the packages available.

I updated my pandas to the latest this morning, but I need to revert to a prior version now. I tried

conda update pandas 0.13.1

but that didn’t work. How do I specify which version to use?


回答 0

我不得不改用该install函数:

conda install pandas=0.13.1

I had to use the install function instead:

conda install pandas=0.13.1

回答 1

对于希望还原最近安装的软件包的情况,该软件包对依赖项进行了一些更改(例如tensorflow),可以通过以下方法“回滚”到较早的安装状态:

conda list --revisions
conda install --revision [revision number]

第一个命令显示以前的安装版本(带有依赖项),第二个命令还原到revision number您指定的版本。

请注意,如果您希望(重新)安装更高版本,则可能必须顺序重新安装所有中间版本。如果您的版本为23,重新安装了版本20,并希望返回,则可能必须运行每个版本:

conda install --revision 21
conda install --revision 22
conda install --revision 23

For the case that you wish to revert a recently installed package that made several changes to dependencies (such as tensorflow), you can “roll back” to an earlier installation state via the following method:

conda list --revisions
conda install --revision [revision number]

The first command shows previous installation revisions (with dependencies) and the second reverts to whichever revision number you specify.

Note that if you wish to (re)install a later revision, you may have to sequentially reinstall all intermediate versions. If you had been at revision 23, reinstalled revision 20 and wish to return, you may have to run each:

conda install --revision 21
conda install --revision 22
conda install --revision 23

Anaconda导出环境文件

问题:Anaconda导出环境文件

如何制作可以在其他计算机上使用的anaconda环境文件?

我使用将Anaconda python环境导出到YML conda env export > environment.yml。导出的environment.yml内容包含此行prefix: /home/superdev/miniconda3/envs/juicyenv,它映射到我的anaconda的位置,这在其他计算机上将有所不同。

How can I make anaconda environment file which could be use on other computers?

I exported my anaconda python environment to YML using conda env export > environment.yml. The exported environment.yml contains this line prefix: /home/superdev/miniconda3/envs/juicyenv which maps to my anaconda’s location which will be different on other’s pcs.


回答 0

我在conda规范中找不到任何可让您导出环境文件的内容prefix: ...。但是,正如Alex在评论中指出的那样,从文件创建环境时,conda似乎并不关心前缀行。

考虑到这一点,如果您希望其他用户不了解您的默认安装路径,则可以grep在写入之前删除前缀行environment.yml

conda env export | grep -v "^prefix: " > environment.yml

无论哪种方式,另一个用户都可以运行:

conda env create -f environment.yml

并且该环境将安装在其默认的conda环境路径中。

如果您要指定与系统默认设置不同的安装路径(与environment.yml中的’prefix’不相关),只需使用-p标记后跟所需的路径即可。

conda env create -f environment.yml -p /home/user/anaconda3/envs/env_name

请注意,Conda建议environment.yml手动创建,这对于要跨平台(Windows / Linux / Mac)共享环境的用户尤其重要。在这种情况下,您可以省略该prefix行。

I can’t find anything in the conda specs which allow you to export an environment file without the prefix: ... line. However, as Alex pointed out in the comments, conda doesn’t seem to care about the prefix line when creating an environment from file.

With that in mind, if you want the other user to have no knowledge of your default install path, you can remove the prefix line with grep before writing to environment.yml.

conda env export | grep -v "^prefix: " > environment.yml

Either way, the other user then runs:

conda env create -f environment.yml

and the environment will get installed in their default conda environment path.

If you want to specify a different install path than the default for your system (not related to ‘prefix’ in the environment.yml), just use the -p flag followed by the required path.

conda env create -f environment.yml -p /home/user/anaconda3/envs/env_name

Note that Conda recommends creating the environment.yml by hand, which is especially important if you are wanting to share your environment across platforms (Windows/Linux/Mac). In this case, you can just leave out the prefix line.


回答 1

从要安装在另一台计算机上的环境中保存软件包的最简单方法是:

$ conda list -e > req.txt

然后您可以使用安装环境

$ conda create -n new environment --file req.txt

如果使用pip,请使用以下命令:reference https://pip.pypa.io/en/stable/reference/pip_freeze/

$ env1/bin/pip freeze > requirements.txt
$ env2/bin/pip install -r requirements.txt

The easiest way to save the packages from an environment to be installed in another computer is:

$ conda list -e > req.txt

then you can install the environment using

$ conda create -n new environment --file req.txt

if you use pip, please use the following commands: reference https://pip.pypa.io/en/stable/reference/pip_freeze/

$ env1/bin/pip freeze > requirements.txt
$ env2/bin/pip install -r requirements.txt

回答 2

  • 的Linux

    conda env导出-无构建| grep -v“前缀”> environment.yml

  • 视窗

    conda env export –no-builds | findstr -v“前缀”> environment.yml


基本原理:默认情况下,conda env export包括构建信息:

$ conda env export
...
dependencies:
  - backcall=0.1.0=py37_0
  - blas=1.0=mkl
  - boto=2.49.0=py_0
...

您可以转而无需构建信息即可导出环境:

$ conda env export --no-builds
...
dependencies:
  - backcall=0.1.0
  - blas=1.0
  - boto=2.49.0
...

这使环境与Python版本和OS脱钩。

  • Linux

    conda env export –no-builds | grep -v “prefix” > environment.yml

  • Windows

    conda env export –no-builds | findstr -v “prefix” > environment.yml


Rationale: By default, conda env export includes the build information:

$ conda env export
...
dependencies:
  - backcall=0.1.0=py37_0
  - blas=1.0=mkl
  - boto=2.49.0=py_0
...

You can instead export your environment without build info:

$ conda env export --no-builds
...
dependencies:
  - backcall=0.1.0
  - blas=1.0
  - boto=2.49.0
...

Which unties the environment from the Python version and OS.


回答 3

我发现仅以字符串格式导出软件包比导出整个conda环境更方便。正如前面的答案已经建议的那样:

$ conda list -e > requirements.txt

但是,它requirements.txt包含内部版本号,这些版本号在操作系统之间(例如Mac和之间)不可移植Ubuntu。在conda env export我们可以选择--no-builds但没有的情况下conda list -e,因此我们可以通过发出以下命令来删除内部版本号:

$ sed -i -E "s/^(.*\=.*)(\=.*)/\1/" requirements.txt 

并在另一台计算机上重新创建环境:

conda create -n recreated_env --file requirements.txt 

I find exporting the packages in string format only is more portable than exporting the whole conda environment. As the previous answer already suggested:

$ conda list -e > requirements.txt

However, this requirements.txt contains build numbers which are not portable between operating systems, e.g. between Mac and Ubuntu. In conda env export we have the option --no-builds but not with conda list -e, so we can remove the build number by issuing the following command:

$ sed -i -E "s/^(.*\=.*)(\=.*)/\1/" requirements.txt 

And recreate the environment on another computer:

conda create -n recreated_env --file requirements.txt 

回答 4

  1. 首先激活您的conda环境(您要导出/备份的环境)
conda activate myEnv
  1. 将所有包导出到文件(myEnvBkp.txt)
conda list --explicit > myEnvBkp.txt
  1. 恢复/导入环境:
conda create --name myEnvRestored --file myEnvBkp.txt
  1. First activate your conda environment (the one u want to export/backup)
conda activate myEnv
  1. Export all packages to a file (myEnvBkp.txt)
conda list --explicit > myEnvBkp.txt
  1. Restore/import the environment:
conda create --name myEnvRestored --file myEnvBkp.txt

是否应将conda或conda-forge用于Python环境?

问题:是否应将conda或conda-forge用于Python环境?

Conda并且conda-forge都是Python软件包管理器。当两个存储库中都存在一个程序包时,合适的选择是什么?例如,Django可以安装其中之一,但是两者之间的区别是几个依赖项(conda-forge还有更多)。对于这些差异没有任何解释,甚至没有简单的自述文件。

应该使用哪一个?康达或康达伪造?有关系吗?

Conda and conda-forge are both Python package managers. What is the appropriate choice when a package exists in both repositories? Django, for example, can be installed with either, but the difference between the two is several dependencies (conda-forge has many more). There is no explanation for these differences, not even a simple README.

Which one should be used? Conda or conda-forge? Does it matter?


回答 0

简短的回答是,根据我的经验,通常使用哪种都无关紧要。

长答案:

所以conda-forge是可以从其中安装的软件包的附加通道。从这个意义上讲,它没有比默认频道更特别,也没有其他任何人将软件包发布到的频道(数千个)中的任何一个。如果您在https://anaconda.org上注册并上传自己的Conda软件包,则可以添加自己的频道。

在这里,我们需要进行区分,我认为您对问题的措辞不清楚conda,即跨平台的程序包管理器和conda-forge程序包通道之间。该conda软件的主要开发人员Anaconda Inc.(以前称为Continuum IO)也维护一个单独的软件包频道,这是您在conda install packagename不更改任何选项的情况下键入的默认软件包。

有三种方法可以更改频道选项。每次安装软件包时,前两个步骤都会完成,而后一个则是持久性的。第一个是在每次安装软件包时指定一个通道:

conda install -c some-channel packagename

当然,该程序包必须存在于该通道上。这样将从进行安装packagename及其所有依赖项some-channel。或者,您可以指定:

conda install some-channel::packagename

该程序包仍然必须存在some-channel,但现在只能packagename从中提取some-channel。可以从您的默认频道列表中搜索满足依赖关系所需的任何其他软件包。

要查看您的频道配置,您可以编写:

conda config --show channels

您可以使用来控制搜索频道的顺序conda config。你可以写:

conda config --add channels some-channel

将通道添加some-channelchannels配置列表的顶部。这具有some-channel最高的优先级。当一个以上通道具有特定程序包时,优先级(部分)确定选择哪个通道。要将频道添加到列表的末尾并赋予其最低的优先级,请输入

conda config --append channels some-channel

如果您想删除添加的频道,可以通过以下方式删除

conda config --remove channels some-channel

看到

conda config -h

有关更多选项。

综上所述,使用conda-forge频道而不是defaultsAnaconda维护频道的主要原因有四个:

  1. 上的软件包conda-forge 可能defaults频道上的软件包最新
  2. conda-forge频道上的某些软件包无法从defaults
  3. 您可能希望使用诸如openblas(from conda-forge)而不是mkl(from defaults)的依赖项。
  4. 如果要安装需要编译库的软件包(例如,C扩展名或C库的包装器),则由于二进制原因,如果从单个通道在环境中安装所有软件包,则可能会减少不兼容的可能性。基本C库的兼容性(但是此建议可能会过时/将来会更改)。

The short answer is that, in my experience generally, it doesn’t matter which you use.

The long answer:

So conda-forge is an additional channel from which packages may be installed. In this sense, it is not any more special than the default channel, or any of the other hundreds (thousands?) of channels that people have posted packages to. You can add your own channel if you sign up at https://anaconda.org and upload your own Conda packages.

Here we need to make the distinction, which I think you’re not clear about from your phrasing in the question, between conda, the cross-platform package manager, and conda-forge, the package channel. Anaconda Inc. (formerly Continuum IO), the main developers of the conda software, also maintain a separate channel of packages, which is the default when you type conda install packagename without changing any options.

There are three ways to change the options for channels. The first two are done every time you install a package and the last one is persistent. The first one is to specify a channel every time you install a package:

conda install -c some-channel packagename

Of course, the package has to exist on that channel. This way will install packagename and all its dependencies from some-channel. Alternately, you can specify:

conda install some-channel::packagename

The package still has to exist on some-channel, but now, only packagename will be pulled from some-channel. Any other packages that are needed to satisfy dependencies will be searched for from your default list of channels.

To see your channel configuration, you can write:

conda config --show channels

You can control the order that channels are searched with conda config. You can write:

conda config --add channels some-channel

to add the channel some-channel to the top of the channels configuration list. This gives some-channel the highest priority. Priority determines (in part) which channel is selected when more than one channel has a particular package. To add the channel to the end of the list and give it the lowest priority, type

conda config --append channels some-channel

If you would like to remove the channel that you added, you can do so by writing

conda config --remove channels some-channel

See

conda config -h

for more options.

With all of that said, there are four main reasons to use the conda-forge channel instead of the defaults channel maintained by Anaconda:

  1. Packages on conda-forge may be more up-to-date than those on the defaults channel
  2. There are packages on the conda-forge channel that aren’t available from defaults
  3. You would prefer to use a dependency such as openblas (from conda-forge) instead of mkl (from defaults).
  4. If you are installing a package that requires a compiled library (e.g., a C extension or a wrapper around a C library), it may reduce the chance of incompatibilities if you install all of the packages in an environment from a single channel due to binary compatibility of the base C library (but this advice may be out of date/change in the future).

回答 1

Anaconda更改了服务条款,以使“大量商业用户”需要付费,其中不包括conda-forge渠道。

conda-forge如果您不想为使用付费,则可能要坚持。如文档所述

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install <package-name>

你也可以使用miniforge具有conda-forge作为默认的通道,并支持ppc64le和aarch64平台,以及其他常用的。

Anaconda has changed their Terms of Service so that “heavy commercial users” would have to pay, which doesn’t include conda-forge channel.

You probably would want to stick to conda-forge if you don’t want to pay for the usage. As stated in the docs:

conda config --add channels conda-forge
conda config --set channel_priority strict
conda install <package-name>

You could also use miniforge which has conda-forge as the default channel, and supports ppc64le and aarch64 platforms as well as the other usual ones.


回答 2

在conda-forge渠道中,您可以找到针对conda构建的软件包,但尚未成为Anaconda官方发行版的一部分。

通常,您可以使用其中任何一个。

The conda-forge channel is where you can find packages that have been built for conda but yet to be part of the official Anaconda distribution.

Generally, you can use any of them.


回答 3

有些Python库无法简单安装,conda install因为除非应用conda-forge,否则它们的通道不可用。根据我的经验,与conda相比,pip更通用于研究不同的渠道来源。例如,如果要安装python-constraint,可以通过,pip install但可以通过** cond **进行安装。您必须指定频道- conda-forge

conda install -c conda-forge python-constraint // works

但不是

conda install python-constraint

There are some Python libraries that you cannot install with a simple conda install since their channel is not available unless you apply conda-forge. From my experience, pip is more generic to look into different channel sources than conda. For instance, if you want to install python-constraint you can do it via pip install but to install it via **cond **. you have to specify the channel – conda-forge.

conda install -c conda-forge python-constraint // works

but not

conda install python-constraint

Conda和Anaconda有什么区别?

问题:Conda和Anaconda有什么区别?

问题后更新:

有关更多详细信息,请参见《 Conda简介》


问题:

当我尝试更新anaconda时,我首先在ubuntu上安装了Anaconda~/anaconda,根据Continuum Analytics 的文档,我应该使用以下命令:

conda update conda
conda update anaconda

然后我意识到我没有安装conda,因此我使用此处的文档进行了安装。

安装conda后,当我运行时conda update anaconda,出现以下错误:

错误:/ home / xiang / miniconda中未安装软件包“ anaconda”

似乎conda假定我的anaconda已安装,/home/xiang/miniconda但事实并非如此。

问题:

  1. condaanaconda有什么区别?
  2. 如何告诉conda我的Anaconda安装在哪里?

Post-question update:

See Introduction to Conda for more details.


The problem:

I first installed Anaconda on my ubuntu at ~/anaconda, when I was trying to update my anaconda, according to the documentation from Continuum Analytics, I should use the following commands:

conda update conda
conda update anaconda

Then I realized that I did not have conda installed, so I installed it using the documentation from here.

After conda is installed, when I run conda update anaconda, I got the following error:

Error: package ‘anaconda’ is not installed in /home/xiang/miniconda

It appears conda is assuming my anaconda is installed under /home/xiang/miniconda which is NOT true.

The questions:

  1. What are the differences between conda and anaconda?
  2. How can I tell conda where my anaconda is installed?

回答 0

conda是程序包管理器。Anaconda是一组大约一百个程序包,包括conda,numpy,scipy,ipython notebook等。

您安装了Miniconda,这是Anaconda的一个较小替代方案,它只是conda及其依赖项,而不是上面列出的依赖项。

拥有Miniconda之后,您可以使用轻松地将Anaconda安装到其中conda install anaconda

conda is the package manager. Anaconda is a set of about a hundred packages including conda, numpy, scipy, ipython notebook, and so on.

You installed Miniconda, which is a smaller alternative to Anaconda that is just conda and its dependencies, not those listed above.

Once you have Miniconda, you can easily install Anaconda into it with conda install anaconda.


回答 1

简要

conda 既是命令行工具,又是python包。

Miniconda安装程序= Python + conda

Anaconda安装程序= Python conda++ meta包anaconda

meta Python pkg anaconda=约160个其他Python日常使用的软件包

Anaconda安装程序= Miniconda安装程序+ conda install anaconda

详情

conda是环境经理和程序包经理。这意味着工具本身。conda使有可能

  • 安装软件包 conda install flake8
  • 使用任何版本的Python创建环境 conda create -n myenv python=3.6

conda不是二进制命令,而是Python包。要进行conda工作,您必须创建一个Python环境并将软件包安装conda到其中。这是Anaconda安装程序和Miniconda安装程序进入的地方。

安装程序Minoconda将安装Python和软件包conda。安装程序Anaconda不仅会执行Miniconda的操作,还会安装一个为您命名的meta Python软件包anaconda

元软件包是不包含实际软件的软件包,仅依赖于要安装的其他软件包。

pkg anaconda中包含的实际160多个python软件包info/recipe/meta.yaml在其源文件中列出。

package:
    name: anaconda
    version: '2019.07'
build:
    ignore_run_exports:
        - '*'
    number: '0'
    pin_depends: strict
    string: py36_0
requirements:
    build:
        - python 3.6.8 haf84260_0
    is_meta_pkg:
        - true
    run:
        - alabaster 0.7.12 py36_0
        - anaconda-client 1.7.2 py36_0
        - anaconda-project 0.8.3 py_0
        # ...
        - beautifulsoup4 4.7.1 py36_1
        # ...
        - curl 7.65.2 ha441bb4_0
        # ...
        - hdf5 1.10.4 hfa1e0ec_0
        # ...
        - ipykernel 5.1.1 py36h39e3cac_0
        - ipython 7.6.1 py36h39e3cac_0
        - ipython_genutils 0.2.0 py36h241746c_0
        - ipywidgets 7.5.0 py_0
        # ...
        - jupyter 1.0.0 py36_7
        - jupyter_client 5.3.1 py_0
        - jupyter_console 6.0.0 py36_0
        - jupyter_core 4.5.0 py_0
        - jupyterlab 1.0.2 py36hf63ae98_0
        - jupyterlab_server 1.0.0 py_0
        # ...
        - matplotlib 3.1.0 py36h54f8f79_0
        # ...
        - mkl 2019.4 233
        - mkl-service 2.0.2 py36h1de35cc_0
        - mkl_fft 1.0.12 py36h5e564d8_0
        - mkl_random 1.0.2 py36h27c97d8_0
        # ...
        - nltk 3.4.4 py36_0
        # ...
        - numpy 1.16.4 py36hacdab7b_0
        - numpy-base 1.16.4 py36h6575580_0
        - numpydoc 0.9.1 py_0
        # ...
        - pandas 0.24.2 py36h0a44026_0
        - pandoc 2.2.3.2 0
        # ...
        - pillow 6.1.0 py36hb68e598_0
        # ...
        - pyqt 5.9.2 py36h655552a_2
        # ...
        - qt 5.9.7 h468cd18_1
        - qtawesome 0.5.7 py36_1
        - qtconsole 4.5.1 py_0
        - qtpy 1.8.0 py_0
        # ...
        - requests 2.22.0 py36_0
        # ...
        - sphinx 2.1.2 py_0
        - sphinxcontrib 1.0 py36_1
        - sphinxcontrib-applehelp 1.0.1 py_0
        - sphinxcontrib-devhelp 1.0.1 py_0
        - sphinxcontrib-htmlhelp 1.0.2 py_0
        - sphinxcontrib-jsmath 1.0.1 py_0
        - sphinxcontrib-qthelp 1.0.2 py_0
        - sphinxcontrib-serializinghtml 1.1.3 py_0
        - sphinxcontrib-websupport 1.1.2 py_0
        - spyder 3.3.6 py36_0
        - spyder-kernels 0.5.1 py36_0
        # ...

来自meta pkg的预安装软件包anaconda主要用于Web抓取和数据科学。像requestsbeautifulsoupnumpynltk,等。

Brief

conda is both a command line tool, and a python package.

Miniconda installer = Python + conda

Anaconda installer = Python + conda + meta package anaconda

meta Python pkg anaconda = about 160 other Python packages for daily use in data science

Anaconda installer = Miniconda installer + conda install anaconda

Detail

conda is an environment manager and a package manager. It means the tool itself. conda makes it possible to

  • install package with conda install flake8
  • create an environment with any version of Python with conda create -n myenv python=3.6

conda is not a binary command, is a Python package. To make conda work, you have to create a Python environment and install package conda into it. This is where Anaconda installer and Miniconda installer comes in.

Installer Minoconda installs a Python and the package conda. Installer Anaconda not only does what Miniconda does, it also install a meta Python package named anaconda for you.

Meta packages, are packages that do NOT contain actual softwares and simply depend on other packages to be installed.

The actual 160+ python packages included in pkg anaconda are listed in info/recipe/meta.yaml in its source file.

package:
    name: anaconda
    version: '2019.07'
build:
    ignore_run_exports:
        - '*'
    number: '0'
    pin_depends: strict
    string: py36_0
requirements:
    build:
        - python 3.6.8 haf84260_0
    is_meta_pkg:
        - true
    run:
        - alabaster 0.7.12 py36_0
        - anaconda-client 1.7.2 py36_0
        - anaconda-project 0.8.3 py_0
        # ...
        - beautifulsoup4 4.7.1 py36_1
        # ...
        - curl 7.65.2 ha441bb4_0
        # ...
        - hdf5 1.10.4 hfa1e0ec_0
        # ...
        - ipykernel 5.1.1 py36h39e3cac_0
        - ipython 7.6.1 py36h39e3cac_0
        - ipython_genutils 0.2.0 py36h241746c_0
        - ipywidgets 7.5.0 py_0
        # ...
        - jupyter 1.0.0 py36_7
        - jupyter_client 5.3.1 py_0
        - jupyter_console 6.0.0 py36_0
        - jupyter_core 4.5.0 py_0
        - jupyterlab 1.0.2 py36hf63ae98_0
        - jupyterlab_server 1.0.0 py_0
        # ...
        - matplotlib 3.1.0 py36h54f8f79_0
        # ...
        - mkl 2019.4 233
        - mkl-service 2.0.2 py36h1de35cc_0
        - mkl_fft 1.0.12 py36h5e564d8_0
        - mkl_random 1.0.2 py36h27c97d8_0
        # ...
        - nltk 3.4.4 py36_0
        # ...
        - numpy 1.16.4 py36hacdab7b_0
        - numpy-base 1.16.4 py36h6575580_0
        - numpydoc 0.9.1 py_0
        # ...
        - pandas 0.24.2 py36h0a44026_0
        - pandoc 2.2.3.2 0
        # ...
        - pillow 6.1.0 py36hb68e598_0
        # ...
        - pyqt 5.9.2 py36h655552a_2
        # ...
        - qt 5.9.7 h468cd18_1
        - qtawesome 0.5.7 py36_1
        - qtconsole 4.5.1 py_0
        - qtpy 1.8.0 py_0
        # ...
        - requests 2.22.0 py36_0
        # ...
        - sphinx 2.1.2 py_0
        - sphinxcontrib 1.0 py36_1
        - sphinxcontrib-applehelp 1.0.1 py_0
        - sphinxcontrib-devhelp 1.0.1 py_0
        - sphinxcontrib-htmlhelp 1.0.2 py_0
        - sphinxcontrib-jsmath 1.0.1 py_0
        - sphinxcontrib-qthelp 1.0.2 py_0
        - sphinxcontrib-serializinghtml 1.1.3 py_0
        - sphinxcontrib-websupport 1.1.2 py_0
        - spyder 3.3.6 py36_0
        - spyder-kernels 0.5.1 py36_0
        # ...

The pre-installed packages from meta pkg anaconda are mainly for web scraping and data science. Like requests, beautifulsoup, numpy, nltk, etc.


无法将“ Conda”识别为内部或外部命令

问题:无法将“ Conda”识别为内部或外部命令

我在Windows 7 Professional计算机上安装了Anaconda3 4.4.0(32位),并在Jupyter笔记本电脑上导入了NumPy和Pandas,因此我认为Python已正确安装。但是当我键入conda listconda --version在命令提示符下时,它说conda is not recognized as internal or external command.

我已经为Anaconda3设置了环境变量;Variable Name: PathVariable Value: C:\Users\dipanwita.neogy\Anaconda3

我该如何运作?

I installed Anaconda3 4.4.0 (32 bit) on my Windows 7 Professional machine and imported NumPy and Pandas on Jupyter notebook so I assume Python was installed correctly. But when I type conda list and conda --version in command prompt, it says conda is not recognized as internal or external command.

I have set environment variable for Anaconda3; Variable Name: Path, Variable Value: C:\Users\dipanwita.neogy\Anaconda3

How do I make it work?


回答 0

尽管其他人为您提供了很好的解决方案,但我认为指出实际情况会有所帮助。根据Anaconda 4.4更改日志,https : //docs.anaconda.com/anaconda/reference/release-notes/#what-s-new-in-anaconda-4-4 :

在Windows上,默认情况下不再更改PATH环境变量,因为这可能导致其他软件出现问题。建议的方法是,当您希望使用Anaconda软件时,改用Anaconda Navigator或Anaconda命令提示符(位于“ Anaconda”下的“开始”菜单中)。

(注意:最近的Win 10并不假定您具有安装或更新的特权。如果命令失败,请右键单击Anaconda命令提示符,选择“更多”,选择“以管理员身份运行”)

这是对先前安装的更改。尽管您也可以随时将其添加到PATH中,但建议使用Navigator或Anaconda Prompt。在安装过程中,现在没有选中将Anaconda添加到PATH的框,但是您可以选择它。

Although you were offered a good solution by others I think it is helpful to point out what is really happening. As per the Anaconda 4.4 changelog, https://docs.anaconda.com/anaconda/reference/release-notes/#what-s-new-in-anaconda-4-4:

On Windows, the PATH environment variable is no longer changed by default, as this can cause trouble with other software. The recommended approach is to instead use Anaconda Navigator or the Anaconda Command Prompt (located in the Start Menu under “Anaconda”) when you wish to use Anaconda software.

(Note: recent Win 10 does not assume you have privileges to install or update. If the command fails, right-click on the Anaconda Command Prompt, choose “More”, chose “Run as administrator”)

This is a change from previous installations. It is suggested to use Navigator or the Anaconda Prompt although you can always add it to your PATH as well. During the install the box to add Anaconda to the PATH is now unchecked but you can select it.


回答 1

我在Windows 10中遇到了同样的问题,请按照以下步骤更新环境变量,它可以正常工作。

我知道这对于简单的环境设置来说是一个冗长的答案,我认为这对于新窗口10用户可能有用。

1)打开Anaconda提示:

2)检查Conda安装位置。

where conda

3)打开高级系统设置

4)点击环境变量

5)编辑路径

6)添加新路径

 C:\Users\RajaRama\Anaconda3\Scripts

 C:\Users\RajaRama\Anaconda3

 C:\Users\RajaRama\Anaconda3\Library\bin

7)打开命令提示符并检查版本

8)在第7步键入conda之后,在cmd中安装anaconda-navigator,然后按y

I was faced with the same issue in windows 10, Updating the environment variable following steps, it’s working fine.

I know It is a lengthy answer for the simple environment setups, I thought it’s may be useful for the new window 10 users.

1) Open Anaconda Prompt:

2) Check Conda Installed Location.

where conda

3) Open Advanced System Settings

4) Click on Environment Variables

5) Edit Path

6) Add New Path

 C:\Users\RajaRama\Anaconda3\Scripts

 C:\Users\RajaRama\Anaconda3

 C:\Users\RajaRama\Anaconda3\Library\bin

7) Open Command Prompt and Check Versions

8) After 7th step type conda install anaconda-navigator in cmd then press y


回答 2

我找到了解决方案。可变值应为C:\Users\dipanwita.neogy\Anaconda3\Scripts

I found the solution. Variable value should be C:\Users\dipanwita.neogy\Anaconda3\Scripts


回答 3

现在在Windows上安装anaconda时,它不会自动将Python或Conda添加到您的路径中。

在安装过程中,您可以选中此框,也可以将python和/或python手动添加到路径中(如下面的图片所示)

如果您不知道您的conda和/或python在哪里,请在anaconda提示符下键入以下命令

where python
where conda

接下来,您可以通过在命令提示符下使用setx命令将Python和Conda添加到您的路径中(替换C:\Users\mgalarnyk\Anaconda2为运行时获得的结果where pythonwhere conda)。

SETX PATH "%PATH%;C:\Users\mgalarnyk\Anaconda2\Scripts;C:\Users\mgalarnyk\Anaconda2"

接下来,关闭该命令提示符并打开一个新命令。恭喜您现在可以使用conda和python

来源:https : //medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444

When you install anaconda on windows now, it doesn’t automatically add Python or Conda to your path.

While during the installation process you can check this box, you can also add python and/or python to your path manually (as you can see below the image)

If you don’t know where your conda and/or python is, you type the following commands into your anaconda prompt

where python
where conda

Next, you can add Python and Conda to your path by using the setx command in your command prompt (replace C:\Users\mgalarnyk\Anaconda2 with the results you got when running where python and where conda).

SETX PATH "%PATH%;C:\Users\mgalarnyk\Anaconda2\Scripts;C:\Users\mgalarnyk\Anaconda2"

Next close that command prompt and open a new one. Congrats you can now use conda and python

Source: https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444


回答 4

为了清楚起见,您需要转到controlpanel\System\Advanced system settings\Environment Variables\Path,然后点击编辑并添加:

C:Users\user.user\Anaconda3\Scripts

到最后并重新启动cmd行

Just to be clear, you need to go to the controlpanel\System\Advanced system settings\Environment Variables\Path, then hit edit and add:

C:Users\user.user\Anaconda3\Scripts

to the end and restart the cmd line


回答 5

如果您有较新版本的Anaconda Navigator,请打开安装中附带的Anaconda Prompt程序。在此处输入所有常用的conda update/ conda install命令。

我认为上面的答案可以解释这一点,但是我可以使用这样一个非常简单的指令。也许会帮助别人。

If you have a newer version of the Anaconda Navigator, open the Anaconda Prompt program that came in the install. Type all the usual conda update/conda install commands there.

I think the answers above explain this, but I could have used a very simple instruction like this. Perhaps it will help others.


回答 6

除了添加C:\Users\yourusername\Anaconda3和之外C:\Users\yourusername\Anaconda3\Scripts(如Raja所建议的那样),还将添加C:\Users\yourusername\Anaconda3\Library\bin到您的path变量中。如果您是在全新安装的Anaconda上执行此操作,则可以防止发生SSL错误。

In addition to adding C:\Users\yourusername\Anaconda3 and C:\Users\yourusername\Anaconda3\Scripts, as recommended by Raja (above), also add C:\Users\yourusername\Anaconda3\Library\bin to your path variable. This will prevent an SSL error that is bound to happen if you’re performing this on a fresh install of Anaconda.


回答 7

转到anaconda提示符(在笔记本电脑的搜索框中键入“ anaconda”)。输入以下命令

where conda

将该位置添加到您的环境路径变量中。关闭cmd,然后再次打开

Go To anaconda prompt(type “anaconda” in search box in your laptop). type following commands

where conda

add that location to your environment path variables. Close the cmd and open it again


回答 8

如果您不想将Anaconda添加到环境中。路径,并且您正在使用Windows,请尝试以下操作:

  • 打开cmd;
  • 键入文件夹安装的路径。就像这样:C:\ Users \ your_home文件夹\ Anaconda3 \ Scripts
  • 测试Anaconda,例如conda –version类型。
  • 更新Anaconda:conda更新conda或conda更新-全部或conda更新anaconda。

更新Spyder:

  • 康达更新qt pyqt
  • 康达更新间谍

If you don’t want to add Anaconda to env. path and you are using Windows try this:

  • Open cmd;
  • Type path to your folder instalation. It’s something like: C:\Users\your_home folder\Anaconda3\Scripts
  • Test Anaconda, for exemple type conda –version.
  • Update Anaconda: conda update conda or conda update –all or conda update anaconda.

Update Spyder:

  • conda update qt pyqt
  • conda update spyder

回答 9

我有Windows 10 64位,这对我有用此解决方案可以在两种(Anaconda / MiniConda)发行版中都可以使用。

  1. 首先,尝试卸载引起问题的anaconda / miniconda
  2. 之后,从“ C:\ Users \”删除“ .anaconda”和“ .conda”文件夹
  3. 如果您有任何杀毒软件装然后尝试排除所有文件夹,子文件夹内的“C:\ ProgramData \ Anaconda3 \”

    • 行为检测。
    • 病毒检测。
    • DNA扫描。
    • 可疑文件扫描。
    • 任何其他病毒防护模式。

    *(注意:“ C:\ ProgramData \ Anaconda3”此文件夹是默认安装文件夹,您可以在安装Anaconda时更改它,仅在安装目标位置提示处替换排除的路径)*

  4. 现在,以管理员权限安装Anaconda。
    • 将安装路径设置为“ C:\ ProgramData \ Anaconda3”或者您可以指定自定义路径,只是要记住该路径不应包含任何空格,并且应将其从病毒检测中排除。
    • 在“高级安装选项”中,您可以选中“将Anaconda添加到我的PATH环境变量(可选)”和“将Anaconda注册为我的默认Python 3.6”
    • 使用其他默认设置进行安装。完成后单击完成。
    • 重启你的电脑。

现在打开命令提示符或Anaconda提示符并使用以下命令检查安装

康达清单

如果您获得任何软件包列表,则表明anaconda / miniconda已成功安装。

I have Windows 10 64 bit, this worked for me, This solution can work for both (Anaconda/MiniConda) distributions.

  1. First of all try to uninstall anaconda/miniconda which is causing problem.
  2. After that delete ‘.anaconda’ and ‘.conda’ folders from ‘C:\Users\’
  3. If you have any antivirus software installed then try to exclude all the folders,subfolders inside ‘C:\ProgramData\Anaconda3\’ from

    • Behaviour detection.
    • Virus detection.
    • DNA scan.
    • Suspicious files scan.
    • Any other virus protection mode.

    *(Note: ‘C:\ProgramData\Anaconda3’ this folder is default installation folder, you can change it just replace your excluded path at installation destination prompt while installing Anaconda)*

  4. Now install Anaconda with admin privileges.
    • Set the installation path as ‘C:\ProgramData\Anaconda3’ or you can specify your custom path just remember it should not contain any white space and it should be excluded from virus detection.
    • At Advanced Installation Options you can check “Add Anaconda to my PATH environment variable(optional)” and “Register Anaconda as my default Python 3.6”
    • Install it with further default settings. Click on finish after done.
    • Restart your computer.

Now open Command prompt or Anaconda prompt and check installation using following command

conda list

If you get any package list then the anaconda/miniconda is successfully installed.


回答 10

当我多次安装Anaconda时,这个问题对我来说就出现了。我很小心地进行了卸载,但是有些事情卸载过程不会撤消。

就我而言,我需要删除一个文件Microsoft.PowerShell_profile.ps1~\Documents\WindowsPowerShell\。通过在文本编辑器中将其打开,我确定了该文件是罪魁祸首。我看到它引用了旧的安装位置C:\Anaconda3\

This problem arose for me when I installed Anaconda multiple times. I was careful to do an uninstall but there are some things that the uninstall process doesn’t undo.

In my case, I needed to remove a file Microsoft.PowerShell_profile.ps1 from ~\Documents\WindowsPowerShell\. I identified that this file was the culprit by opening it in a text editor. I saw that it referenced the old installation location C:\Anaconda3\.


回答 11

我刚刚启动了anaconda-navigator并从那里运行conda命令。

I have just launched anaconda-navigator and run the conda commands from there.


回答 12

我在Windows中遇到了这个问题。大多数答案都不是anaconda推荐的,您不应将路径添加到环境变量中,因为它可能会破坏其他内容。相反,您应该使用顶部答案中提到的anaconda提示符。

但是,这也可能会中断。在这种情况下,右键单击快捷方式,转到快捷方式选项卡,目标值应类似于:

%windir%\System32\cmd.exe "/K" C:\Users\myUser\Anaconda3\Scripts\activate.bat C:\Users\myUser\Anaconda3

I had this problem in windows. Most of the answers are not as recommended by anaconda, you should not add the path to the environment variables as it can break other things. Instead you should use anaconda prompt as mentioned in the top answer.

However, this may also break. In this case right click on the shortcut, go to shortcut tab, and the target value should read something like:

%windir%\System32\cmd.exe "/K" C:\Users\myUser\Anaconda3\Scripts\activate.bat C:\Users\myUser\Anaconda3