如何将SQL查询结果转换为PANDAS数据结构？

Question 1

Any help on this problem will be greatly appreciated.

So basically I want to run a query to my SQL database and store the returned data as Pandas data structure.

I have attached code for query.

I am reading the documentation on Pandas, but I have problem to identify the return type of my query.

I tried to print the query result, but it doesn’t give any useful information.

Thanks!!!!

from sqlalchemy import create_engine

engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
  SELECT 
      sum(BLABLA) AS BLA,
      sum(BLABLABLA2) AS BLABLABLA2,
      sum(SOME_INT) AS SOME_INT,
      sum(SOME_INT2) AS SOME_INT2,
      100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
      sum(SOME_INT2)/sum(SOME_INT) AS cpc
   FROM daily_report_cooked
   WHERE campaign_id = '%s'", %dataid)

So I sort of want to understand what’s the format/datatype of my variable “resoverall” and how to put it with PANDAS data structure.

Question 2

Here’s the shortest code that will do the job:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()

You can go fancier and parse the types as in Paul’s answer.

Question 3

Edit: Mar. 2015

As noted below, pandas now uses SQLAlchemy to both read from (read_sql) and insert into (to_sql) a database. The following should work

import pandas as pd

df = pd.read_sql(sql, cnxn)

Previous answer: Via mikebmassey from a similar question

import pyodbc
import pandas.io.sql as psql

cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"

df = psql.frame_query(sql, cnxn)
cnxn.close()

Question 4

If you are using SQLAlchemy’s ORM rather than the expression language, you might find yourself wanting to convert an object of type sqlalchemy.orm.query.Query to a Pandas data frame.

The cleanest approach is to get the generated SQL from the query’s statement attribute, and then execute it with pandas’s read_sql() method. E.g., starting with a Query object called query:

df = pd.read_sql(query.statement, query.session.bind)

Question 5

Edit 2014-09-30:

pandas now has a read_sql function. You definitely want to use that instead.

Original answer:

I can’t help you with SQLAlchemy — I always use pyodbc, MySQLdb, or psychopg2 as needed. But when doing so, a function as simple as the one below tends to suit my needs:

import decimal

import pydobc
import numpy as np
import pandas

cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)

def __processCursor(cur, dataframe=False, index=None):
    '''
    Processes a database cursor with data on it into either
    a structured numpy array or a pandas dataframe.

    input:
    cur - a pyodbc cursor that has just received data
    dataframe - bool. if false, a numpy record array is returned
                if true, return a pandas dataframe
    index - list of column(s) to use as index in a pandas dataframe
    '''
    datatypes = []
    colinfo = cur.description
    for col in colinfo:
        if col[1] == unicode:
            datatypes.append((col[0], 'U%d' % col[3]))
        elif col[1] == str:
            datatypes.append((col[0], 'S%d' % col[3]))
        elif col[1] in [float, decimal.Decimal]:
            datatypes.append((col[0], 'f4'))
        elif col[1] == datetime.datetime:
            datatypes.append((col[0], 'O4'))
        elif col[1] == int:
            datatypes.append((col[0], 'i4'))

    data = []
    for row in cur:
        data.append(tuple(row))

    array = np.array(data, dtype=datatypes)
    if dataframe:
        output = pandas.DataFrame.from_records(array)

        if index is not None:
            output = output.set_index(index)

    else:
        output = array

    return output

Question 6

MySQL Connector

For those that works with the mysql connector you can use this code as a start. (Thanks to @Daniel Velkov)

Used refs:

import pandas as pd
import mysql.connector

# Setup MySQL connection
db = mysql.connector.connect(
    host="<IP>",              # your host, usually localhost
    user="<USER>",            # your username
    password="<PASS>",        # your password
    database="<DATABASE>"     # name of the data base
)   

# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()

# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")

# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names

# Close the session
db.close()

# Show the data
print(sql_data.head())

Question 7

Here’s the code I use. Hope this helps.

import pandas as pd
from sqlalchemy import create_engine

def getData():
  # Parameters
  ServerName = "my_server"
  Database = "my_db"
  UserPwd = "user:pwd"
  Driver = "driver=SQL Server Native Client 11.0"

  # Create the connection
  engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database + "?" + Driver)

  sql = "select * from mytable"
  df = pd.read_sql(sql, engine)
  return df

df2 = getData()
print(df2)

Question 8

This is a short and crisp answer to your problem:

from __future__ import print_function
import MySQLdb
import numpy as np
import pandas as pd
import xlrd

# Connecting to MySQL Database
connection = MySQLdb.connect(
             host="hostname",
             port=0000,
             user="userID",
             passwd="password",
             db="table_documents",
             charset='utf8'
           )
print(connection)
#getting data from database into a dataframe
sql_for_df = 'select * from tabledata'
df_from_database = pd.read_sql(sql_for_df , connection)

Question 9

1. Using MySQL-connector-python

# pip install mysql-connector-python

import mysql.connector
import pandas as pd

mydb = mysql.connector.connect(
    host = 'host',
    user = 'username',
    passwd = 'pass',
    database = 'db_name'
)
query = 'select * from table_name'
df = pd.read_sql(query, con = mydb)
print(df)

2. Using SQLAlchemy

# pip install pymysql
# pip install sqlalchemy

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine('mysql+pymysql://username:password@localhost:3306/db_name')

query = '''
select * from table_name
'''
df = pd.read_sql_query(query, engine)
print(df)

Question 10

Like Nathan, I often want to dump the results of a sqlalchemy or sqlsoup Query into a Pandas data frame. My own solution for this is:

query = session.query(tbl.Field1, tbl.Field2)
DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])

Question 11

resoverall is a sqlalchemy ResultProxy object. You can read more about it in the sqlalchemy docs, the latter explains basic usage of working with Engines and Connections. Important here is that resoverall is dict like.

Pandas likes dict like objects to create its data structures, see the online docs

Good luck with sqlalchemy and pandas.

Question 12

Simply use pandas and pyodbc together. You’ll have to modify your connection string (connstr) according to your database specifications.

import pyodbc
import pandas as pd

# MSSQL Connection String Example
connstr = "Server=myServerAddress;Database=myDB;User Id=myUsername;Password=myPass;"

# Query Database and Create DataFrame Using Results
df = pd.read_sql("select * from myTable", pyodbc.connect(connstr))

I’ve used pyodbc with several enterprise databases (e.g. SQL Server, MySQL, MariaDB, IBM).

Question 13

This question is old, but I wanted to add my two-cents. I read the question as ” I want to run a query to my [my]SQL database and store the returned data as Pandas data structure [DataFrame].”

From the code it looks like you mean mysql database and assume you mean pandas DataFrame.

import MySQLdb as mdb
import pandas.io.sql as sql
from pandas import *

conn = mdb.connect('<server>','<user>','<pass>','<db>');
df = sql.read_frame('<query>', conn)

For example,

conn = mdb.connect('localhost','myname','mypass','testdb');
df = sql.read_frame('select * from testTable', conn)

This will import all rows of testTable into a DataFrame.

Question 14

Here is mine. Just in case if you are using “pymysql”:

import pymysql
from pandas import DataFrame

host   = 'localhost'
port   = 3306
user   = 'yourUserName'
passwd = 'yourPassword'
db     = 'yourDatabase'

cnx    = pymysql.connect(host=host, port=port, user=user, passwd=passwd, db=db)
cur    = cnx.cursor()

query  = """ SELECT * FROM yourTable LIMIT 10"""
cur.execute(query)

field_names = [i[0] for i in cur.description]
get_data = [xx for xx in cur]

cur.close()
cnx.close()

df = DataFrame(get_data)
df.columns = field_names

Question 15

pandas.io.sql.write_frame is DEPRECATED. https://pandas.pydata.org/pandas-docs/version/0.15.2/generated/pandas.io.sql.write_frame.html

Should change to use pandas.DataFrame.to_sql https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

There is another solution. PYODBC to Pandas – DataFrame not working – Shape of passed values is (x,y), indices imply (w,z)

As of Pandas 0.12 (I believe) you can do:

import pandas
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = pandas.read_sql(sql, cnn)

Prior to 0.12, you could do:

import pandas
from pandas.io.sql import read_frame
import pyodbc

sql = 'select * from table'
cnn = pyodbc.connect(...)

data = read_frame(sql, cnn)

Question 16

Long time from last post but maybe it helps someone…

Shorted way than Paul H:

my_dic = session.query(query.all())
my_df = pandas.DataFrame.from_dict(my_dic)

Question 17

best way I do this

db.execute(query) where db=db_class() #database class
    mydata=[x for x in db.fetchall()]
    df=pd.DataFrame(data=mydata)

Question 18

If the result type is ResultSet, you should convert it to dictionary first. Then the DataFrame columns will be collected automatically.

This works on my case:

df = pd.DataFrame([dict(r) for r in resoverall])

如何将SQL查询结果转换为PANDAS数据结构？

问题：如何将SQL查询结果转换为PANDAS数据结构？

回答 0

回答 1

回答 2

回答 3

编辑2014-09-30：

原始答案：

Edit 2014-09-30:

Original answer:

回答 4

MySQL连接器

MySQL Connector

回答 5

回答 6

回答 7

1.使用MySQL-connector-python

2.使用SQLAlchemy

1. Using MySQL-connector-python

2. Using SQLAlchemy

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

相关文章

排行榜展示

文章展示