added insertTable function by EeshanChatterjee · Pull Request #12 · pingles/redshift-r

EeshanChatterjee · 2015-09-15T13:23:52Z

Added function to insert an R data.frame into a redshift table. Created the query automatically.

DerekYves · 2017-01-20T13:28:55Z

is this pull request going to be merged? Looks useful and wondered if anyone has tested it? I'm new with redshift and trying to build a process for writing small and big tables into redshift; guess two approaches will be needed.

Also having trouble with search path settings: writing tables always goes into the public schema, meaning the "." Is ignored in the database name, so "test.test" gets written to the public schema as test.test rather than as table "test" in the "test" schema. Any tips on this would be greatly appreciated!

pingles · 2017-01-20T14:12:55Z

Hi, thanks for the reminder.

I think the biggest issue is that it kind of assumes you're inserting textual data and assumes you're writing all columns at once. It'd be much better/reliable to handle the specific columns being inserted and generate the string appropriately- currently including the escaping character would break the insert.

As for inserting data- our experience has been that using INSERT INTO is extraordinarily slow in comparison to the bulk loads. We wrote a tool to automate loading data from S3 (so you could just dump CSV files on S3 and it'd automatically ingest) but I think AWS have examples of doing the same with Lambdas now.

EeshanChatterjee · 2017-01-20T15:51:00Z

Agree with Paul here. Bulk uploads are very inefficient using the above. I had written this with the primary objective of not having to have a separate piece of code / script to upload small lookup files to redshift- each no more than a few MBs (single digits!).

DerekYves · 2017-01-20T16:44:21Z

Thanks for the info! I have now heard that the AWS best-practice would be R-dataframe ---> S3 ----> Redshift (via a Copy command) I see this package/function from redshiftTools to automate this process: redshiftTools::rs_replace_table(my_data, dbcon=con, tableName='mytable', bucket="mybucket") (source: https://www.r-bloggers.com/how-to-bulk-upload-your-data-from-r-into-redshift/) Can either of you recommend this or another function to perform the data-frame to Redshift conversion for large tables? I'm new to Redshift in a new position and trying to figure out a production-quality pipeline for pushing transformed data out of an R dataframe into Redshift. Thank you very much for any tips!! Derek

…

On Fri, Jan 20, 2017 at 10:51 AM, Eeshan Chatterjee < ***@***.***> wrote: Agree with Paul here. Bulk uploads are very inefficient using the above. I had written this with the primary objective of not having to have a separate piece of code / script to upload small lookup files to redshift- each no more than a few MBs (single digits!). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/APwNq2MA8ETW545ufADfpLhLB0HqmvOsks5rUNflgaJpZM4F9nZG> .

added insertTable function

1ac2dc2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added insertTable function#12

added insertTable function#12
EeshanChatterjee wants to merge 1 commit intopingles:masterfrom
EeshanChatterjee:master

EeshanChatterjee commented Sep 15, 2015

Uh oh!

DerekYves commented Jan 20, 2017

Uh oh!

pingles commented Jan 20, 2017

Uh oh!

EeshanChatterjee commented Jan 20, 2017

Uh oh!

DerekYves commented Jan 20, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EeshanChatterjee commented Sep 15, 2015

Uh oh!

DerekYves commented Jan 20, 2017

Uh oh!

pingles commented Jan 20, 2017

Uh oh!

EeshanChatterjee commented Jan 20, 2017

Uh oh!

DerekYves commented Jan 20, 2017 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants