问题:Google Colab:如何从我的Google驱动器读取数据?

问题很简单:例如,我在gDrive上有一些数据 /projects/my_project/my_data*

另外,我在gColab中有一个简单的笔记本。

所以,我想做些类似的事情:

for file in glob.glob("/projects/my_project/my_data*"):
    do_something(file)

不幸的是,所有示例(例如,例如https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb)都建议仅将所有必要的数据加载到笔记本中。

但是,如果我有很多数据,可能会非常复杂。有解决这个问题的机会吗?

感谢帮助!

The problem is simple: I have some data on gDrive, for example at /projects/my_project/my_data*.

Also I have a simple notebook in gColab.

So, I would like to do something like:

for file in glob.glob("/projects/my_project/my_data*"):
    do_something(file)

Unfortunately, all examples (like this – https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb, for example) suggests to only mainly load all necessary data to notebook.

But, if I have a lot of pieces of data, it can be quite complicated. Is there any opportunities to solve this issue?

Thanks for help!


回答 0

好消息,PyDrive在CoLab上提供了一流的支持!PyDrive是Google Drive python客户端的包装器。这是一个有关如何从文件夹下载所有文件的示例,类似于使用glob+ *

!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
  os.makedirs(local_download_path)
except: pass

# 2. Auto-iterate using the query syntax
#    https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
    {'q': "'1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk' in parents"}).GetList()

for f in file_list:
  # 3. Create & download by id.
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  f_.GetContentFile(fname)


with open(fname, 'r') as f:
  print(f.read())

请注意,to的参数drive.ListFile是一个字典,与Google Drive HTTP API使用的参数一致(您可以自定义q参数以调整到用例)。

请注意,在所有情况下,文件/文件夹均由Google云端硬盘上的id编码(窥视1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk)。这就要求您在Google云端硬盘中搜索与您要在其中进行搜索的文件夹相对应的特定ID。

例如,导航到"/projects/my_project/my_data"Google云端硬盘中的文件夹。

Google云端硬碟

看到它包含一些文件,我们要在其中下载到CoLab。要获取文件夹的ID以便由PyDrive使用,请查看url并提取id参数。在这种情况下,对应于该文件夹的URL为:

https://drive.google.com/drive/folders/1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

id是网址的最后一部分:1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

Good news, PyDrive has first class support on CoLab! PyDrive is a wrapper for the Google Drive python client. Here is an example on how you would download ALL files from a folder, similar to using glob + *:

!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
  os.makedirs(local_download_path)
except: pass

# 2. Auto-iterate using the query syntax
#    https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
    {'q': "'1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk' in parents"}).GetList()

for f in file_list:
  # 3. Create & download by id.
  print('title: %s, id: %s' % (f['title'], f['id']))
  fname = os.path.join(local_download_path, f['title'])
  print('downloading to {}'.format(fname))
  f_ = drive.CreateFile({'id': f['id']})
  f_.GetContentFile(fname)


with open(fname, 'r') as f:
  print(f.read())

Notice that the arguments to drive.ListFile is a dictionary that coincides with the parameters used by Google Drive HTTP API (you can customize the q parameter to be tuned to your use-case).

Know that in all cases, files/folders are encoded by id’s (peep the 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk) on Google Drive. This requires that you search Google Drive for the specific id corresponding to the folder you want to root your search in.

For example, navigate to the folder "/projects/my_project/my_data" that is located in your Google Drive.

Google Drive

See that it contains some files, in which we want to download to CoLab. To get the id of the folder in order to use it by PyDrive, look at the url and extract the id parameter. In this case, the url corresponding to the folder was:

https://drive.google.com/drive/folders/1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk

Where the id is the last piece of the url: 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk.


回答 1

编辑:从2020年2月开始,现在有了用于自动安装云端硬盘的一流UI。

首先,打开左侧的文件浏览器。它将显示“安装驱动器”按钮。单击后,您将看到安装驱动器的权限提示,然后,当您返回笔记本计算机时,您的驱动器文件将不进行任何设置。完成的流程如下所示:

驱动器自动安装示例

原始答案如下。(这对于共享笔记本仍然有效。)

您可以通过运行以下代码段来挂载Google云端硬盘文件:

from google.colab import drive
drive.mount('/content/drive')

然后,您可以在文件浏览器侧面板或使用命令行实用程序与您的云端硬盘文件进行交互。

这是一个示例笔记本

Edit: As of February, 2020, there’s now a first-class UI for automatically mounting Drive.

First, open the file browser on the left hand side. It will show a ‘Mount Drive’ button. Once clicked, you’ll see a permissions prompt to mount Drive, and afterwards your Drive files will be present with no setup when you return to the notebook. The completed flow looks like so:

Drive auto mount example

The original answer follows, below. (This will also still work for shared notebooks.)

You can mount your Google Drive files by running the following code snippet:

from google.colab import drive
drive.mount('/content/drive')

Then, you can interact with your Drive files in the file browser side panel or using command-line utilities.

Here’s an example notebook


回答 2

感谢您的精彩回答!从Google云端硬盘将一些一次性文件传输到Colab的最快方法:加载云端硬盘帮助程序并挂载

from google.colab import drive

这将提示您进行授权。

drive.mount('/content/drive')

在新标签页中打开链接->您将获得一个代码-将其复制回提示符,您现在可以访问Google驱动器检查:

!ls "/content/drive/My Drive"

然后根据需要复制文件:

!cp "/content/drive/My Drive/xy.py" "xy.py"

确认文件已复制:

!ls

Thanks for the great answers! Fastest way to get a few one-off files to Colab from Google drive: Load the Drive helper and mount

from google.colab import drive

This will prompt for authorization.

drive.mount('/content/drive')

Open the link in a new tab-> you will get a code – copy that back into the prompt you now have access to google drive check:

!ls "/content/drive/My Drive"

then copy file(s) as needed:

!cp "/content/drive/My Drive/xy.py" "xy.py"

confirm that files were copied:

!ls

回答 3

先前的大多数答案都非常复杂

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

我认为这是将google驱动器安装到CO Lab的最简单,最快的方法,您mount directory location只需更改的参数即可将其更改为所需的格式drive.mount。它会为您提供一个链接,以接受您帐户的权限,然后您必须复制粘贴生成的密钥,然后将驱动器安装在所选路径中。

force_remount 仅在必须安装驱动器时才使用它,而与之前是否已加载无关。如果不想强制安装,可以忽略when参数。

编辑:查看此内容以找到IO在colab https://colab.research.google.com/notebooks/io.ipynb中进行操作的更多方法

Most of the previous answers are a bit(Very) complicated,

from google.colab import drive
drive.mount("/content/drive", force_remount=True)

I figured out this to be the easiest and fastest way to mount google drive into CO Lab, You can change the mount directory location to what ever you want by just changing the parameter for drive.mount. It will give you a link to accept the permissions with your account and then you have to copy paste the key generated and then drive will be mounted in the selected path.

force_remount is used only when you have to mount the drive irrespective of whether its loaded previously.You can neglect this when parameter if you don’t want to force mount

Edit: Check this out to find more ways of doing the IO operations in colab https://colab.research.google.com/notebooks/io.ipynb


回答 4

您不能在colab上永久存储文件。尽管您可以从驱动器中导入文件,并且每次使用完文件后都可以将其保存回来。

要将Google驱动器安装到您的Colab会话中

from google.colab import drive
drive.mount('/content/gdrive')

您可以像写入本地文件系统一样简单地写入google驱动器。现在,如果您看到google驱动器将被加载到“文件”标签中。现在,您可以从colab中访问任何文件,也可以对其进行写入和读取。更改将在驱动器上实时完成,任何具有访问文件链接的人都可以从colab中查看您所做的更改。

with open('/content/gdrive/My Drive/filename.txt', 'w') as f:
   f.write('values')

You can’t permanently store a file on colab. Though you can import files from your drive and everytime when you are done with file you can save it back.

To mount the google drive to your Colab session

from google.colab import drive
drive.mount('/content/gdrive')

you can simply write to google drive as you would to a local file system Now if you see your google drive will be loaded in the Files tab. Now you can access any file from your colab, you can write as well as read from it. The changes will be done real time on your drive and anyone having the access link to your file can view the changes made by you from your colab.

Example

with open('/content/gdrive/My Drive/filename.txt', 'w') as f:
   f.write('values')

回答 5

我很懒惰,记忆力很差,所以我决定创建一个easycolab ,它易于记忆和键入:

import easycolab as ec
ec.mount()

确保首先安装它: !pip install easycolab

mount()方法基本上实现了这一点:

from google.colab import drive
drive.mount(‘/content/drive’)
cd ‘/content/gdrive/My Drive/’

I’m lazy and my memory is bad, so I decided to create easycolab which is easier to memorize and type:

import easycolab as ec
ec.mount()

Make sure to install it first: !pip install easycolab

The mount() method basically implement this:

from google.colab import drive
drive.mount(‘/content/drive’)
cd ‘/content/gdrive/My Drive/’

回答 6

您只需使用屏幕左侧的代码段即可。 在此处输入图片说明

插入“在虚拟机中安装Google云端硬盘”

运行代码并将代码复制并粘贴到URL中

然后使用!ls检查目录

!ls /gdrive

在大多数情况下,您将在目录“ / gdrive /我的驱动器”中找到所需的内容

那么您可以像这样执行它:

from google.colab import drive
drive.mount('/gdrive')
import glob

file_path = glob.glob("/gdrive/My Drive/***.txt")
for file in file_path:
    do_something(file)

You can simply make use of the code snippets on the left of the screen. enter image description here

Insert “Mounting Google Drive in your VM”

run the code and copy&paste the code in the URL

and then use !ls to check the directories

!ls /gdrive

for most cases, you will find what you want in the directory “/gdrive/My drive”

then you may carry it out like this:

from google.colab import drive
drive.mount('/gdrive')
import glob

file_path = glob.glob("/gdrive/My Drive/***.txt")
for file in file_path:
    do_something(file)

回答 7

我首先要做的是:

from google.colab import drive
drive.mount('/content/drive/')

然后

%cd /content/drive/My Drive/Colab Notebooks/

在我可以例如以以下方式读取csv文件后

df = pd.read_csv("data_example.csv")

如果文件的位置不同,则在“我的云端硬盘”之后添加正确的路径

What I have done is first:

from google.colab import drive
drive.mount('/content/drive/')

Then

%cd /content/drive/My Drive/Colab Notebooks/

After I can for example read csv files with

df = pd.read_csv("data_example.csv")

If you have different locations for the files just add the correct path after My Drive


回答 8

我写了一个类,将所有数据下载到“。”中。在colab服务器中的位置

整个事情都可以从这里https://github.com/brianmanderson/Copy-Shared-Google-to-Colab

!pip install PyDrive


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os

class download_data_from_folder(object):
    def __init__(self,path):
        path_id = path[path.find('id=')+3:]
        self.file_list = self.get_files_in_location(path_id)
        self.unwrap_data(self.file_list)
    def get_files_in_location(self,folder_id):
        file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
        return file_list
    def unwrap_data(self,file_list,directory='.'):
        for i, file in enumerate(file_list):
            print(str((i + 1) / len(file_list) * 100) + '% done copying')
            if file['mimeType'].find('folder') != -1:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    os.makedirs(os.path.join(directory, file['title']))
                print('Copying folder ' + os.path.join(directory, file['title']))
                self.unwrap_data(self.get_files_in_location(file['id']), os.path.join(directory, file['title']))
            else:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    downloaded = drive.CreateFile({'id': file['id']})
                    downloaded.GetContentFile(os.path.join(directory, file['title']))
        return None
data_path = 'shared_path_location'
download_data_from_folder(data_path)

I wrote a class that downloads all of the data to the ‘.’ location in the colab server

The whole thing can be pulled from here https://github.com/brianmanderson/Copy-Shared-Google-to-Colab

!pip install PyDrive


from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os

class download_data_from_folder(object):
    def __init__(self,path):
        path_id = path[path.find('id=')+3:]
        self.file_list = self.get_files_in_location(path_id)
        self.unwrap_data(self.file_list)
    def get_files_in_location(self,folder_id):
        file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
        return file_list
    def unwrap_data(self,file_list,directory='.'):
        for i, file in enumerate(file_list):
            print(str((i + 1) / len(file_list) * 100) + '% done copying')
            if file['mimeType'].find('folder') != -1:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    os.makedirs(os.path.join(directory, file['title']))
                print('Copying folder ' + os.path.join(directory, file['title']))
                self.unwrap_data(self.get_files_in_location(file['id']), os.path.join(directory, file['title']))
            else:
                if not os.path.exists(os.path.join(directory, file['title'])):
                    downloaded = drive.CreateFile({'id': file['id']})
                    downloaded.GetContentFile(os.path.join(directory, file['title']))
        return None
data_path = 'shared_path_location'
download_data_from_folder(data_path)

回答 9

例如,要从Google colab笔记本中提取Google Drive zip:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

To extract Google Drive zip from a Google colab notebook for example:

import zipfile
from google.colab import drive

drive.mount('/content/drive/')

zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

回答 10

@wenkesj

我说的是复制目录及其所有子目录。

对我来说,我找到了一个解决方案,如下所示:

def copy_directory(source_id, local_target):
  try:
    os.makedirs(local_target)
  except: 
    pass
  file_list = drive.ListFile(
    {'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
  for f in file_list:
    key in ['title', 'id', 'mimeType']]))
    if f["title"].startswith("."):
      continue
    fname = os.path.join(local_target, f['title'])
    if f['mimeType'] == 'application/vnd.google-apps.folder':
      copy_directory(f['id'], fname)
    else:
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)

不过,我看起来gDrive不想复制太多文件。

@wenkesj

I am speaking about copy the directory and all it subdirectories.

For me, I found a solution, that looks like this:

def copy_directory(source_id, local_target):
  try:
    os.makedirs(local_target)
  except: 
    pass
  file_list = drive.ListFile(
    {'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
  for f in file_list:
    key in ['title', 'id', 'mimeType']]))
    if f["title"].startswith("."):
      continue
    fname = os.path.join(local_target, f['title'])
    if f['mimeType'] == 'application/vnd.google-apps.folder':
      copy_directory(f['id'], fname)
    else:
      f_ = drive.CreateFile({'id': f['id']})
      f_.GetContentFile(fname)

Nevertheless, I looks like gDrive don’t like to copy too much files.


回答 11

有很多方法可以读取colab笔记本中的文件(**。ipnb),其中一些方法是:

  1. 在运行时的虚拟机中挂载Google云端硬盘。这里这里
  2. 使用google.colab.files.upload()。最简单的解决方案
  3. 使用本地REST API ;
  4. 在API(例如PyDrive)周围使用包装器

方法1和2 对我有用,其余我无法弄清楚。如果有人可以,正如其他人在上面的帖子中所尝试的,请写下一个优雅的答案。提前致谢。!

第一种方法:

我无法挂载Google驱动器,因此我安装了这些库

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

安装和授权过程完成后,首先安装驱动器。

!mkdir -p drive
!google-drive-ocamlfuse drive

安装后,我能够挂载Google驱动器,您的Google驱动器中的所有内容都从/ content / drive开始

!ls /content/drive/ML/../../../../path_to_your_folder/

现在,您可以path_to_your_folder使用上述路径将文件从文件夹中读取到熊猫中。

import pandas as pd
df = pd.read_json('drive/ML/../../../../path_to_your_folder/file.json')
df.head(5)

您假设您使用收到的绝对路径,而不使用/../ ..

第二种方法

如果您要读取的文件位于当前工作目录中,则这很方便。

如果您需要从本地文件系统上载任何文件,则可以使用以下代码,否则请避免使用它。

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

假设您在Google驱动器中的文件夹层次结构以下:

/content/drive/ML/../../../../path_to_your_folder/

然后,您只需要下面的代码即可加载到熊猫中。

import pandas as pd
import io
df = pd.read_json(io.StringIO(uploaded['file.json'].decode('utf-8')))
df

There are many ways to read the files in your colab notebook(**.ipnb), a few are:

  1. Mounting your Google Drive in the runtime’s virtual machine.here &, here
  2. Using google.colab.files.upload(). the easiest solution
  3. Using the native REST API;
  4. Using a wrapper around the API such as PyDrive

Method 1 and 2 worked for me, rest I wasn’t able to figure out. If anyone could, as others tried in above post please write an elegant answer. thanks in advance.!

First method:

I wasn’t able to mount my google drive, so I installed these libraries

# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass

!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

Once the installation & authorization process is finished, you first mount your drive.

!mkdir -p drive
!google-drive-ocamlfuse drive

After installation I was able to mount the google drive, everything in your google drive starts from /content/drive

!ls /content/drive/ML/../../../../path_to_your_folder/

Now you can simply read the file from path_to_your_folder folder into pandas using the above path.

import pandas as pd
df = pd.read_json('drive/ML/../../../../path_to_your_folder/file.json')
df.head(5)

you are suppose you use absolute path you received & not using /../..

Second method:

Which is convenient, if your file which you want to read it is present in the current working directory.

If you need to upload any files from your local file system, you could use below code, else just avoid it.!

from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

suppose you have below the folder hierarchy in your google drive:

/content/drive/ML/../../../../path_to_your_folder/

Then, you simply need below code to load into pandas.

import pandas as pd
import io
df = pd.read_json(io.StringIO(uploaded['file.json'].decode('utf-8')))
df

回答 12

要读取文件夹中的所有文件:

import glob
from google.colab import drive
drive.mount('/gdrive', force_remount=True)

#!ls "/gdrive/My Drive/folder"

files = glob.glob(f"/gdrive/My Drive/folder/*.txt")
for file in files:  
  do_something(file)

To read all files in a folder:

import glob
from google.colab import drive
drive.mount('/gdrive', force_remount=True)

#!ls "/gdrive/My Drive/folder"

files = glob.glob(f"/gdrive/My Drive/folder/*.txt")
for file in files:  
  do_something(file)

回答 13

from google.colab import drive
drive.mount('/content/drive')

这对我来说非常完美,后来我可以使用该os库访问文件,就像在PC上访问文件一样

from google.colab import drive
drive.mount('/content/drive')

This worked perfect for me I was later able to use the os library to access my files just like how I access them on my PC


回答 14

考虑只下载与永久链路的文件,并gdown预装喜欢这里

Consider just downloading the file with permanent link and gdown preinstalled like here


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。