OCI-GovCloud Migration: Copy from S3 to OCI

We are migrating from AWS to Oracle Cloud Infrastructure (OCI) during the fall/winter of 2023. Our checklist for success involves the following:

The objective during this stage of our migration involves migrating the database pump file from AWS to OCI.

Table of Contents

    What Did Work

    This process is awkward, slow, and rather manual. As a positive, it is entirely transparent too. I do NOT care that it is awkward, slow, and rather manual because we have to do each schema just once. And we have to do 1 production schema twice (once to practice and configure and once on the go-live weekend).

    Success ranks higher than efficient when the task is not repeated. Again, got comments, write an article and share your experiences. Two experienced professionals spent 2 full work days experimenting, and failing before finding success. I’ll review the failures below.

    Each to their Strength

    AWS CLI

    The AWS CLI works brilliantly well and is robust, but installs best on operating systems that AWS loves such as the Linux versions that AWS supports on their EC2 instances. Get that configured with your credentials and boom, you can copy files from an EC2 instance to an AWS S3 bucket. Perfect.

    AWS CLI did not work with Oracle’s Object Storage. Our failures included:

    • could not connect to endpoint URL
    • 401/authorization

    My colleague, Eli Duvall, and I worked very hard to follow this article from Ron Ekin’s article on OCI Object Storage using AWS CLI We came close. Our instance is built within FedRamp/GovCloud so there are more security constraints. Maybe that is a reason?

    What success did we have? We could copy a large database pump file (dmp) from an EC2 instance to an AWS S3 bucket.

    OCI CLI

    We tried to install OCI CLI on the AWS EC2 instance. What a failure. It appeared that our Linux version on AWS was not supported by Oracle’s CLI. Hummm?

    What does OCI CLI install well on? Correct. On an Oracle Linux server located at OCI. So we did that. We could copy and list files on Object Storage Buckets from the Linux Command line.

    The Gap

    AWS CLI worked well within AWS.

    OCI CLI worked well within OCI.

    What we needed was to install either OCI CLI on an AWS server or we needed to install AWS CLI on an Oracle Linux server. We failed. Errors and time-wasting articles robbed us of hours. We finally installed a third-party app on our Oracle Linux server called s3cmd.

    https://www.howtogeek.com/devops/how-to-sync-files-from-linux-to-amazon-s3

    This worked great.

    The Workflow

    Step 1: Database Pump

    Find the folder/directory where the backup files (DMP) will land. First use a query to find the directory and it’s folder:

    select * from all_directories;

    For me, the desired directory is:

    data_pump_dir => /oracle/admin/proddb/dpdump/

    On your AWS Linux or Microsoft server that hosts the Oracle database, run your database pump command.

    sudo su - oracle 
    cd /oracle/admin/proddb/dpdump
    expdp fred/secret_password dumpfile=my_app_20231107.DMP logfile=my_app_20231107.log schemas=MY_APP

    This file now sits in your designated Oracle directory. We navigated to this folder and confirmed the contents with ll, ls, or dir depending on your Operating System.

    Step 2: Copy file to AWS S3

    With AWS CLI installed and configured make sure that you can do some basic commands such as

    And yes our credential file uses a profile. That profile name is the same name as the AWS user that we set up. The access key id and the secret access key are located in the file ~/.aws/credential (which is easily edited with ye olde vi or your favorite tool).

    aws --version
    aws s3api list-objects-v2 --bucket sp-log-test --profile apex-s3-user

    If your AWS CLI is operational and your basic tests worked, then copy your dump file to the desired AWS S3 bucket. Please modify the pathing and filenames to match your needs. You may need a profile statement if you have multiple profiles set up.

    aws s3 cp /oracle/admin/proddb/dpdump/my_app_20231107.DMP s3://the_bucket_name/my_app_20231107.DMP --profile my_profile

    Success? Does the file exist on AWS S3 using your browser? Is the file size the same? Good.

    Log off of your lovely Linux server. Your done with 1 database pump from 1 schema.

    Step 3: OCI CLI

    Log into an EC2 instance running Oracle Linux at OCI. Do the clicking and such. Don’t care must about the server. Small and disposable is the key. When we’re done with our migration, we don’t plan on having any EC2 instances.

    OCI CLI was installed but the configuration file did not exist. I went to my personal user profile in OCI and asked for an API key. I pulled down my private and public keys storing them locally. I copied the profile that has the fingerprint and all that loveliness. I did a vi on ~/aws/credential

    [DEFAULT]
    user=ocid1.user.oc2..aaaazzzzzzzzzzzzzzzzzzzzzzzzzz5sa
    fingerprint=e7:8a:5d:1b:z1:07:30:3e:e1:a4:g2:a0:c8:i5:e8:2a
    tenancy=ocid1.tenancy.oc2..aazzzzzzzzzzzzzzzzzzzzaz42anlm5jiq
    region=us-langley-1
    key_file=/home/opc/.oci/cm_2023-11-08T18_50_02.746Z.pem
    

    We did a few basic tests to confirm functionality.

    oci os bucket list -c ocid1.compartment.oc2..aaaazzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzlqk7la
    -- SUCCESS
    
    oci os object get -bn mucket-20231102-1133 --file - --name docs.json
    -- SUCCESS
    
    oci os object get -ns whooci --bucket-name mucket-20231102-1133 --file /home/opc/fakeinvoice.pdf --name fakeinvoice.pdf 
    -- SUCCESS
    
    oci os object list -ns whooci --bucket-name mucket-20231102-1133 --all
    -- SUCCESS

    We can now copy files from Object Storage to the Linux server. YAY.

    At this point we declared that we were “walking the dog”. In other words, the dog was not walkin’ us around. We were in charge.

    Step 4: AWS CLI/ s3cmd

    We entirely failed at trying to install AWS CLI on this server. We pivoted (after a long time) to run with s3cmd. That installed nicely. Prior to jumping into this, you will need your AWS S3 credentials (see Step 2 above). Best to follow the instructions in https://www.howtogeek.com/devops/how-to-sync-files-from-linux-to-amazon-s3

    wget https://sourceforge.net/projects/s3tools/files/s3cmd/2.2.0/s3cmd-2.2.0.tar.gz
    tar xzf s3cmd-2.2.0.tar.gz
    cd s3cmd-2.2.0 
    sudo python setup.py install
    s3cmd --configure

    With the configure option, we keyed in my super secret credential for AWS and my AWS user. Then we tested a few basic commands:

    s3cmd ls
    -- success, listed the stuff. 
    s3cmd get s3://my-bucket-test/yum.log-20190101 /home/opc/yum.log-20190101
    -- copied a file. success

    Step 5: Copy file from AWS S3 to Linux Server

    We successfully copied an 11GiB file from AWS S3 to the Linux server sitting at OCI.

    s3cmd get s3://my-bucket-test/myapp_20231107.DMP /home/opc/myapp_20231107.DMP
    -- success, 449s for 11G / 7.48min

    While it did take 7.5 minutes, we had already invested 16 hrs in failures (2 people failing hard for 2 work days).

    Step 6: Copy file from Linux Server to OCI Object Storage Bucket

    “To each their own”, that was the solution. So we now used the OCI CLI to put the file in a bucket. We sensed success instantly when the process broken our file in to 84 parts. Given we were “local” to OCI, the transfer took just more than half the time that the copy from AWS had to take.

    oci os object put -ns whooci --bucket-name mucket-20231102 --file /home/opc/myapp_20231107.DMP

    It took about 4 minutes to transfer the file.

    Step 7: Copy file from OCI Object Storage Bucket to Oracle’s DATA_PUMP_DIR

    The file that we need are located on the OCI bucket and we need them in the Oracle database directory designed for data pump files. This is typically the data_pump_dir. To accomplish this you will use the SQL command line for the command dbma_cloud.get_object.

    If you do not yet have credentials setup in the Autonomous Database, then you’ll need to set those up.

    • go to OCI
    • Identity and Security
    • Domains then click default domain (bottom of the domains table, should be a blue link)
    • On the left side of the screen, click Users, select your name/username
    • Again, on the bottom half of the screen on the left side under Resources, select Auth Tokens
    • Click Generate Token button, give it a name, then click Generate Token button again
    • Copy the authentication cod

    or

    • Select Profile (top right person icon)
    • Choose My Profile from the list
    • On the bottom half of the screen on the left side under Resources, select Auth Tokens
    • Click Generate Token button, give it a name, then click Generate Token button again
    • Copy the authentication code
    • both ways you need to generate and copy the token
    begin
      dbms_cloud.create_credential (
        credential_name => 'OBJ_STORE_CRED_ME', --give the credential a recognizable name
        username => 'me@example.com', --use your OCI username
        password => 'superSecr3t' --auth token from steps above
      );
    end;

    After credentials are established, you get Get Object

    --use get_object code with credentials and specified directory
    BEGIN
    	DBMS_CLOUD.GET_OBJECT(
    	credential_name => 'OBJ_STORE_CRED_ME',
    	object_uri => 'https://objectstorage.us-langley-1.oraclegovcloud.com/n/exampleoci/b/mybucket/o/myapp_20231107.DMP',
    	directory_name => 'DATA_PUMP_DIR');
    END;

    Using the select statement, monitor the import of the file until the file size matches your target file size. There is no notification that the file is complete. You can watch it grow.

    select * from dbms_cloud.list_files('DATA_PUMP_DIR');

    Step 8 – Import the Data Dump

    This is in a different article. Click here

    What Did Not Work

    We spent 2 days exploring solutions, articles, and technology that failed on us including suggestions from Oracle technical support. There maybe better, easier, and smoother techniques. If so, write an article and share them.

    Challenges

    When we created our database backup (datapump or expdp) on our production commercial applications we encountered very large files. We ran into compatibility issues between Oracle and AWS. And more… (of course).

    Copying a Big File/Migrating a Big File

    Our first attempt was on database that tipped the scales at 11Gigabytes. This steps into the world as a “big file” and is not easily moved with standard techniques. It must be treated as a multi-part or some other means of dividing the file into smaller bits. The advertised technique for copying a file from AWS S3 to an Object Storage Bucket is with

    DBMS_CLOUD.GET_OBJECT

    When used with dbms_cloud.create_credential this worked so well, we thought we made it. We tested with small files because we are human, impatient and love success. This failed monstrously with big files.

    OCI Object Storage

    Object Storage Buckets do have a compatibility mode and are supposed to be compatible with AWS S3. I have written and maintained a PL/SQL AWS S3 API. It works modestly with OCI Object Storage, but not perfectly. I was not surprised to struggle for hours using the AWS Command Line Interface (CLI) trying to connect to OCI Object Storage to inventory objects in buckets and to move them.

    Clone & Copy Utilities

    We found numerous articles about clone and copy utilities. For this reason or that, we could not get these reliably installed and working on the various Linux servers.

    The 1-2-3 of Data Normalization

    Abstract

    In the 1960s and 1970s, Dr Edgar Codd, a British computer scientist, developed a means of reducing data duplication with in relational databases. While six “forms” of the process exist, 1-4 ought to be at every database application developer’s finger tips. Fifty, nearly sixty, years have elapsed. Not only do these guidelines still apply today, when we forget them, we can cripple the flexibility and dynamic ability for software to grow and adapt.

    Number Zero

    The Number Zero of the data normalization forms is called: Unnormalized. While I would love to joke about this structure of data, we all encounter it just too often. Therefore, let’s acknowledge that it exists.

    Movin’ on!

    Data Table (a requisite element)

    To normalize data, one must be working within a relational database environment, such as Oracle and such. The table must have zero, one, or more rows/records. Each row/record must have a primary key. The primary key must be unique. Quick checklist:

    • Data stored in tables
    • Tables have 1 or more columns (fields)
    • Tables have 0, 1, or more rows (records)
    • Each row has a unique primary key that uniquely identifies one row
    • Each row of data must relate to 1 “item” (topic/element/etc)

    My Short Hand

    1. Model the world as it is, not as you want it to be. The world is messy. It is ok to use complex data structures to model complex data. “Complicated things are complicated”
    2. Think first: then add tables when needed; add fields when needed
    3. Look for lateral duplication
    4. Look for vertical duplication
    5. Look for non-data data (messy/junky stuff)

    Lateral Duplication

    Do not add a field called Contact2, ever.

    Do not add a field called email2, ever.

    Do not add a field called Address2, ever.

    If you have multiple contacts, add a table for the contacts. One row for each contact.

    If you find that your tables have columns duplicating sideways (laterally), then create a new table! Hey people have multiple email addresses. Create a table with a primary key and 1 email address. Validate the quality of the email to the best of the applications ability. Each new email address is a new row. Done.

    This is a casual, but functional means of defining the First Normal Form.

    Vertical Duplication

    Look at your tables. Do you have important data that is duplicated by users going vertically in a table? You can see this if you have multiple clients called Acme Anvil. If you have five contact people named Wiley E Coyote, then you have an opportunity to improve the data model.

    The solution tends to be new tables and breaking the data into smaller building blocks. When done well you work towards both the Second and Third Normal Forms.

    Non-Data Data

    When we ask users to type data free formed into fields, we tend to get junk. They enter “data” that is not-really-data. I am not talking about client names or personal names but all of those reason fields and note fields. That free-formed information. We mistype; we abbreviate; we get verbose; we get terse. In short, we get junk. Unless we use artificial intelligence or wicked complicate code, we lose the value of what the user is trying to tell us! Junk is junk!

    If data matter to us, then we ought to use lookup tables and codes so that we capture what users need to track. It takes listening and tracking what they are doing. That is a great reason to talk with users and talk with your client. It is their data. We like good data. Good data has value. Bad data is bad.

    First Normal Form (1NF)

    To satisfy the First Normal Form of relational databases, then:

    • Each column(field) of a table must have a “single value”. This maybe a primary key, a foreign key, a client name, etc. When we store compound data within a field, we’ve stepped off the path. Sorry JSON.

    How do we break this rule?

    Let’s imagine a table with client data. The classic table has the unique primary key and the client name. Imagine needing an address. Let’s add an address column to the table. Into the column, we type the address: Spui 70 (boĆ®te postale 12600),2500 DJ La Haye, Nederland (That’s the Hague). Suddenly, I need another address for the same location. So I add in a delivery address for heavy trucks. If I put a second address into the ADDRESS field, then I have eliminated the usefulness of the Address field. And I have not followed the First Normalization Form.

    Another example with the client table, imagine a field for the email address of your contact person there. Store one address: compliant. Store two in that field and your are not compliant. Game over.

    fred@flintstone.fake is fair

    fred@flintstone.fake, barney@flintstone.fake in the same field breaks the rules.

    Why is this bad?

    Imagine that you need to email the client and you have two email addresses in the same field called EMAIL. If you send an email with one vendor, they want commas to separate. If you use the list with a different email vendor, they want a semi-colon between the two email address. You’ve lost control of the formatting, validity and the usefulness of the email address. It is useless data (unless you write code that parses it out.

    Same with printing a mailing label for the Hague address above. You want precisely and exactly one address on a shipping label. If you ADDRESS field has two, then you are sunk. You lost the data game. No points for you.

    Second Normal Form (2NF)

    To satisfy the Second Normal Form (2NF) then:

    • Each row must have a unique primary key.
    • Rows ought describe unique data sets

    Sure, yes, that is in the requisite steps listed above. Sometimes, we create data tables that require applications to always use multiple keys to find a desired row. In this situation, we use multiple or compound keys to access data.

    Violating this rule happens all of the time. Can you envision the need to store a client in the client table that may have two division at two locations? Some people would create two separate clients each with one address. But are they the same client or one? We now have the name duplicated. In this row, we have ACME ANVILS

    Primary KeyClient NameAddress
    11235Acme Anvil123 Roadrunner Ave, WB
    2358Acme Anvil123 Wiley Ln, WB
    Second Data Normalization Form

    We do not know if this is one client with two addresses or two clients. We failed to model the complexity of the real-world data. We must solve problems with multiple divisions and multiple addresses. The solution tends to involve adding tables that have greater specificity.

    We are creating redundancy in the data going vertically. Which Acme Anvil do we need? How do we identify the one we want?

    Why is this bad?

    You want to place your finger on the required data elements with ease. If you don’t know which is which, then how will the application?

    How to Fix?

    Typically, we need to add fields and tables to more completely and more robustly model the data as it is, not as we want it to be. Yes, the real world is messy. Clients have divisions, multiple offices, multiple addresses, multiple contact people. Roll the sleeve up, figure out the data needed and how to relate it.

    Do not be afraid view the data world as small data blocks that snap together like Lego bricks.

    In this case, maybe we have a table for divisions. We create a table for addresses. Maybe we get super clever and add a client foreign key that references the primary key in its own table. This self-reference creates a tree structure or a hierarchical query. Super powerful for modelling data that nests within itself. With this technique, a client becomes a child of another client: which can represent a division within a client.

    Third Normal Form (3NF)

    Data typed by hand is:

    1. Flawed/error-prone
    2. In consistent

    To satisfy the 3rd Normal Form:

    • Use Lookup Tables and lists to maintain consistency within the data

    Often, when we are asking users to classify or identify the qualities of data, we are asking them to type something. We had a situation today, when a user knew that a client would have a tax ID, but did not yet have a tax ID. So the user typed: 0000.0000.0000 – which meant, “I put in a place holder”. It is not a valid tax id. The team failed to model this event in the lifecycle of client data. We use databases to organize and query data, often massive amounts of data. Consistency matters. Maybe there ought to be a field that says: Tax ID pending. Or maybe there is a lifecycle for the tax ID.

    • Client does not have a tax ID and does not need one
    • Client applied for a tax ID and it is pending
    • Client has a tax ID and it typed in
    • The system validated the tax ID against an external authority.

    Now we have four states of the tax ID in addition to the tax ID number. We get more information.

    To satisfy the Third Normalization Form, we should create a table for Tax ID status:

    • No Tax ID Required (NONE)
    • Tax ID Pending (PENDING)
    • Tax ID entered (EXISTS)
    • Tax ID validated with external authority (VALID)

    We then use a look up table in conjunction with the Tax ID field to provide a more complete understanding of the data we are capturing. if a Tax ID is empty, we would have to guess. Is the Tax ID empty because the client does not need one? Or is it empty because the Tax ID is pending? If guessing, you’ll be right some times! 50-50, right?

    When data become dependent on each other, we add tables and lookups to enforce consistency within the data set.

    Had we asked users to type a reason for the absence of the Tax ID, we would see all sort of junk! We turned an opportunity to improve the quality of data into a mess. If you user selects from lists, then we are working harder to satisfy the Third Normal Form. My short hand version of this sounds like this: Use Look up tables to validate data.

    Other Normal Forms

    Data get complicated because, friends, our world is complicated. We can handle tables with multiple data dependencies. Normal Forms four, five and six to exist. And while I barely understand the terminology used in the academic literature and even on-line. Following these forms lead us into gloriously interesting structures. Child tables that matrix to multiple parent tables. Once playing in these deeper waters, peers tend to sit and discuss how to work through these issues.

    Conclusion

    When making decisions, we always make trade-offs. We violate the normal forms both accidently and on-purpose. In my podcast, The Soul of an Internet Machine, I discuss data normalization. In the late 1960s, people were not writing software that ran globally supporting 100 spoken languages across two-dozen time zone circling the earth. We did not create a means of supporting data in multiple languages simultaneously. If we duplicate the data laterally for English, French, Dutch, German, and Italian, we violate normal forms. If we create new rows, we may also violate normal forms (depends on the structure). If we store the phrases within JSON data inside of a field, we again violated the rules. So we plan and discuss.

    When we add fields such as contact1, contact2, contact3, contact4 or even email1, email2, email3, email4 then we’ve created structural flaws in our applications from which we may never recover. Our application have a limited useful life – until we bring it closer to the Normal Forms. But strict adherence may yield other problems. I get as close as I can to the Normal Forms, then I acknowledge… maybe not.

    Selecting Font APEX from a Database Table

    Quick means of grabbing font apex from CSS text file For those of us that work databases applications from the server-side and feel less sophisticated with client-side activities, I wanted to keep an inventory of the icon available in Oracle APEX to users within standard database tables. Why would a software developer include fonts in a column? Imagine scoring with a star-based rating system, or the simplicity of thumbs up/thumbs down.

    SQL Development in Multiple Languages

    SQL Development must work on a global platform but only if we get the settings correct to display languages correctly in our code code. When an accent grave goes bad, it looks like an acute error (that’s a bad pun).