Technical Infrastructure: Difference between revisions

From ADA Public Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
== Repository Software ==
==Repository Software==
The ADA implements the OAIS Reference Model [42] with deployed Dataverse [49] installations for the SIP (deposit.ada.edu.au), AIP (dataverse-test.ada.edu.au) and DIP (dataverse.ada.edu.eu). The Dataverse Project [49] is open source and community-supported code. The Dataverse team has described how the Dataverse software meets CoreTrustSeal Technical Infrastructure requirements [50].


ADA also implements web-based tools, developed and maintained in-house, to support its archiving process:
The ADA implements the OAIS Reference Model [42] with deployed Dataverse [49] installations for the SIP (deposit.ada.edu.au), AIP (dataverse-test.ada.edu.au) and DIP (dataverse.ada.edu.eu).   
* Curation and Risk Assessment Tool (CARAT) [73]
* ADA Deposit and Preservation Tool (ADAPT) [6]
* Ingest Reporting Tool (upload of SPSS, generates data dictionary, quality check, confidentiality check) [74]


Access management for data access requests from both dataverse.ada.edu.au and anu-dataverse.ada.edu.au production DIP instances is managed through Osticket [51] ticketing opensource software, with access being Granted or Rejected in Dataverse itself by adding file permissions for approved users. Osticket as a service is required to be available to the same extent as Dataverse to be able to manage access requests.
The Dataverse Project [49] is open source and community-supported code. The Dataverse team has described how the Dataverse software meets CoreTrustSeal Technical Infrastructure requirements [50].


==Version Control == 
ADA also implements web-based tools, including those developed and maintained in-house, to support its archiving process:   
The Harvard Dataverse Project team uses GitHub [54] for its version control system.  Any tools or software that ADA produces internally are managed through GitHub. 


==IT Service Management ==
Open Source: 
Due to its simplicity when compared to the ITIL Service Management framework, the YASM Service Management framework has been initiated by the ADA Technical Manager.  As part of this YASM framework, a Service Portfolio has been created, to record and track ADA’s various internal and external services. 


At the end of every calendar month, the ADA DevOps role performs updates and maintenance tasks on all of ADA’s services, to keep code and applications up to date.  A simple Change Management approach is taken to inform the internal ADA team and external Dataverse users of changes that will be implemented with upcoming Dataverse upgrades. ADA maintains Dataverse installations whose sole purpose is to test changes in new Dataverse releases before pushing the update to ADA’s three primary publicly consumed installations. ADA plans the release schedule of new Dataverse versions in step with the Dataverse Project’s releases. Not all available functionality is desired for ADA so any new functionality is tested and evaluated to determine when to enable it. 
* Metabase  for reporting analytics [82] 


NCI performs an automated weekly set of security tests on ADA’s NCI-hosted services. A report with any identified issues is emailed to the ADA technical team and to the ADA Director. The ADA technical team carries out maintenance to address the report-identified issues.  ADA services are also monitored by the ANU Chief Information Office (CIO) [52], and ADA receives emails that alert the ADA team to any discovered problems, with a request to address them.
* OSTicket task management and tracking application [51] 
In-House :
* Curation and Risk Assessment Tool (CARAT) [73]  
 
* ADA Deposit and Preservation Tool (ADAPT) [6]  
 
* Ingest Reporting Tool (upload of SPSS, generates data dictionary, quality check, confidentiality check) [74]
 
* Coordinated Access to Data, Research and Environments (CADRE) [97]  
 
Data access provision, from both dataverse.ada.edu.au and anu-dataverse.ada.edu.au production DIP instances, is managed through Osticket [51] ticketing software, with access being Granted or Rejected in Dataverse itself by adding file permissions for approved users. Along with Dataverse, Osticket as a service is required to be available  to manage access requests.


==Infrastructure Standards == 
The ADA has implemented a stand-alone web application that is built around the 5 Safes [96]: CADRE (Coordinated Access to Data, Research and Environments) [97]. CADRE integrates ADA's Production Dataverse [10] for specific datasets, facilitating more efficient access requests and approvals/rejections. Requests for Production Dataverse [10] datasets managed through CADRE [97] are approved and revoked within CADRE [97], with granting/revoking access to and from users in ADA's Production Dataverse [10] occurring programmatically.
ANU and NCI [7] have security standards in place to prevent ANU and NCI technical infrastructure from being adversely affected. NCI monitors the primary Dataverse installations with its f5 WAF [55] service.
 
ADA will gradually manage all its dataset access requests through CADRE [97].  
 
==Version Control==
 
The Harvard Dataverse Project team uses GitHub [54] for its version control system.  Any in-house tools or software that ADA produces internally are managed through GitHub.   
   
   
== Availability, Bandwidth & Connectivity ==
==IT Service Management==
The ADA Dataverse installations are available 24/7. Requests are prioritised and managed within the time capabilities of the ADA Access Management team. Messaging is posted to the ADA’s website and production Dataverse instances to alert users to future ADA shutdowns, to allow users to submit data access requests in a timely manner.  
 
Due to its simplicity when compared to the ITIL Service Management framework, the YASM Service Management framework has been initiated by the ADA Technical Manager.  As part of this YASM framework, a Service Portfolio has been created, to record and track ADA’s various internal and external services. 
 
At the end of every calendar month, the ADA DevOps role performs updates and maintenance tasks on all of ADA’s services, to keep code and applications up to date.  A simple Change Management approach is taken to inform the internal ADA team and external users of changes that will be implemented with upcoming Dataverse upgrades. ADA maintains Dataverse installations whose sole purpose is to test changes in new Dataverse releases before pushing the update to ADA’s three primary publicly consumed installations. ADA plans the release schedule of new Dataverse versions in step with the Dataverse Project’s releases. Not all available functionality is desired for ADA so any new functionality is tested and evaluated to determine when to enable it.    
 
NCI [7] performs an automated weekly set of security tests on ADA’s NCI-hosted services. A report with any identified issues is emailed to the ADA technical team and to the ADA Director. The ADA technical team carries out maintenance to address the report-identified issues. ADA also receives email alerts from the Australian Signals Directorate (Australia’s national cyber security and intelligence agency) [98] and/or the ANU Information Security Office (ISO) [99] bringing attention to any discovered problems with one or more ADA Services, with a request to address them. The ADA technical team works to address those problems and reports back to the alerting organisation.
 
==Infrastructure Standards==
 
ANU and NCI [7] have security standards in place to prevent ANU and NCI technical infrastructure from being adversely affected. NCI monitors the primary Dataverse installations with its f5 WAF [55] service.   
 
==Availability, Bandwidth & Connectivity==
 
ADA's web services including its Dataverse installations are available 24/7. Data access requests are prioritised and managed within the time capabilities of the ADA Access Management team. Messaging is posted to the ADA’s website and production Dataverse instances to alert users to future ADA shutdowns, to allow users to submit data access requests in a timely manner.  
 
NCI [7] manages network availability and bandwidth for ADA’s NCI-hosted services:
 
:::'''Network details (17/10/2025)''':
 
:::AVAILABILITY:
::::Network is at ~ 99.95%
 
:::BANDWIDTH:
::::Over 10Gbps available
 
:::CONNECTIVITY:
::::Redundant network connectivity to ADA services
 
:::DISASTER RECOVERY:
::::Active - Failover automatic network recovery
 
:::'''ADA's VM Datastore(s)''': 
 
:::ADA's VMs are all hosted on high availability compute clusters that will restart any VMs on surviving compute hosts within their hosted data centre cluster. ADA's VMs can also be manually restarted in the alternate data centre if there is a whole compute cluster failure or whole data centre outage.
 
:::'''NAS File Server Volume(s)''': 


NCI manages network availability and bandwidth for ADA’s NCI-hosted services. NCI also manages its f5 WAF that provides a level of protection for the Dataverse installations.
:::ADA's NAS file server is hosted on high availability storage cluster that will failover any file server to surviving storage nodes within their hosted data centre cluster. ADA's file server can also be manually restarted in the alternate data centre if there is a whole storage cluster failure or whole data centre outage.  


ANU’s central ITS services [52] manage the domains for each Dataverse installation as well as DNS updates for any services not behind the f5 WAF. ITS is responsible for monitoring the domain registrations, to ensure they are renewed before services become unavailable when a domain registration expires.
NCI also manages its f5 WAF that provides a level of protection for ADA’s Dataverse installations.


NCI manages SSL certificates for ADA’s web-based services and inform the ADA technical team when SSL certificates are about to expire, reissuing them for installing on ADA’s virtual machines (VM).
NCI manages SSL certificates for ADA’s web-based services and inform the ADA technical team when SSL certificates are about to expire, reissuing them for installing on ADA’s virtual machines (VM).


The ADA technical team has implemented monitors that detect when ADA’s VMs go offline, sending an email to the ADA Director, Technical Manager and DevOps roles. The ADA Devops and Technical Manager work to get the systems back online, consulting with NCI if necessary.
NCI alerts ADA about planned NCI infrastructure outages. The ADA team posts messaging on the Dataverse installations, and the ADA website, to alert users that the systems will be offline on the specified date(s) and time(s).


ADA is alerted to planned NCI infrastructure outages. The ADA team posts messaging on the Dataverse installations, and the ADA website, that the systems will be offline on the specified date(s) and time(s).
The ADA technical team has deployed monitoring software that detects when ADA’s VMs go offline, sending an email to the ADA Director, Technical Manager and DevOps roles. The ADA Devops (primarily) and Technical Manager (if necessary) work to get the systems back online, consulting with NCI if necessary.   


==Disaster recovery==  
==Disaster Recovery==
Hourly snapshots of ADA’s NCI storage are taken, as well as snapshots/backups of ADA’s Dataverse installations including the Dataverses’ local file storage. Backups of the Dataverse databases are created on their specific VM and stored for 3 months. NCI can restore the ADA project storage from regular backups. If any Dataverse installation has to be re-deployed from a snapshot, the ADA Devops role works in conjunction with NCI to get them back up and running with the most recent snapshot. 


NCI is consulted on any issues with SSL certificates and may inform the ADA team to consult ANU ITS. ANU ITS is also consulted on issues relating to the domain.  
NCI [7] manages snapshots of ADA’s data as follows:
'''ADA's VM Datastore(s)''':
* Snapshot every 3 hours and retained for 24 hours.
* Snapshot every 24 hours and retained for 14 days.
* Mirrored to alternate data centre daily and retained for 24 hours.


== Technical Change == 
'''NAS File Server Volume(s)''':
The Dataverse GitHub repo is monitored for new releases. The ADA staff are also members of the Dataverse User Community [19] and are made aware of new releases via that group as well. Any Dataverse bugs or new features needed by ADA are documented by the ADA Technical Manager on the Dataverse GitHub repo.  
*Snapshot every hour and retain for 24 hours.
*Snapshot every day and retained for 7 days.
*Snapshot every week and retained for 1 month.
*Mirrored to alternate data centre daily and retained for 24 hours


Any technical changes relating to Preservation and/or Reuse identified by the Archivist Team are brought up with the ADA Technical Team on an as-needed case-by-case basis. The technical change is discussed and evaluated as to whether it is required and possible to implement. If the ADA technical team can implement the needed change, the team manages it in consultation with the Archivist team. The ADA technical team consults with identified external sources to implement changes where required.
Every 3 months ADA’s NCI mdss (mass data storage system) data is automatically tarballed and copied to the NCI mass storage / tape silo service.
The Dataverse databases are automatically backed up daily:
* Locally to their respective VM - retained for 1 month.  
* to NCI external storage - retained for 6 months.  


Any Dataverse feature requests for functionality deemed missing according to ADA requirements are created as an issue in the Dataverse GitHub [53] for consideration. 


==References==
'''SSL Certificates and Domains'''
[50] Dataverse support for CTS – (https://dataverse.org/book/technical-infrastructure)
 
NCI is consulted on any technical issues related to SSL certificates and may inform the ADA team to consult ANU ITS. ANU ITS is also consulted on technical issues relating to ADA’s web service domains.  


[51] osTicket – (https://github.com/osTicket/osTicket)
==Technical Change==


[52] ANU ITS – (https://services.anu.edu.au/business-units/information-technology-services)
The Dataverse GitHub repo [53] is monitored for new releases. The ADA Director and Technical Manager are also members of the Dataverse User Community [19] and are made aware of new releases via that group.  Any Dataverse bugs or new features needed by ADA are documented by the ADA Technical Manager on the Dataverse GitHub repo [53].  


[53] Dataverse GitHub – (https://github.com/IQSS/dataverse/issues)
Any Dataverse feature requests for functionality deemed missing according to ADA requirements are created as an issue in the Dataverse GitHub [53] for consideration.


[54] GitHub – (https://github.com)[55] F5 – (https://www.f5.com/)
==References==


[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)  
[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)


[49] The Dataverse Project – (https://dataverse.org)  
[49] The Dataverse Project – (https://dataverse.org)  
[82] Metabase - (https://www.metabase.com/)
[51] osTicket – (https://github.com/osTicket/osTicket)
[73] ADA CARAT tool – (https://github.com/ADA-ANU/ADA_Research_Data_Tools/tree/main/ADA_DRAT_v2)


[6] ADAPT – (https://docs.ada.edu.au/index.php/ADAPT)  
[6] ADAPT – (https://docs.ada.edu.au/index.php/ADAPT)  


[73] ADA CARAT tool – (https://github.com/ADA-ANU/ADA_Research_Data_Tools/tree/main/ADA_DRAT_v2)  
[74] ADA Ingest Reporting Tool – (https://github.com/ADA-ANU/ADA_Research_Data_Tools/tree/main/ADA_reports)
 
[97] CADRE - (https://cadre.ada.edu.au)


[74] ADA Ingest Reporting Tool – (https://github.com/ADA-ANU/ADA_Research_Data_Tools/tree/main/ADA_reports)  
[96] 5 Safes - (https://fivesafes.org/)
 
[10] ADA Production Dataverse - (https://dataverse.ada.edu.au/)


[7] National Computational Infrastructure – (https://nci.org.au/)  
[7] National Computational Infrastructure – (https://nci.org.au/)  
[53] Dataverse GitHub – (https://github.com/IQSS/dataverse/issues)


[19] Dataverse User Community – (https://groups.google.com/g/dataverse-community?pli=1)
[19] Dataverse User Community – (https://groups.google.com/g/dataverse-community?pli=1)

Latest revision as of 01:39, 4 December 2025

Repository Software

The ADA implements the OAIS Reference Model [42] with deployed Dataverse [49] installations for the SIP (deposit.ada.edu.au), AIP (dataverse-test.ada.edu.au) and DIP (dataverse.ada.edu.eu).   

The Dataverse Project [49] is open source and community-supported code. The Dataverse team has described how the Dataverse software meets CoreTrustSeal Technical Infrastructure requirements [50].

ADA also implements web-based tools, including those developed and maintained in-house, to support its archiving process:   

Open Source: 

  • Metabase  for reporting analytics [82] 
  • OSTicket task management and tracking application [51] 

In-House :

  • Curation and Risk Assessment Tool (CARAT) [73]  
  • ADA Deposit and Preservation Tool (ADAPT) [6]  
  • Ingest Reporting Tool (upload of SPSS, generates data dictionary, quality check, confidentiality check) [74]
  • Coordinated Access to Data, Research and Environments (CADRE) [97]

Data access provision, from both dataverse.ada.edu.au and anu-dataverse.ada.edu.au production DIP instances, is managed through Osticket [51] ticketing software, with access being Granted or Rejected in Dataverse itself by adding file permissions for approved users. Along with Dataverse, Osticket as a service is required to be available to manage access requests. 

The ADA has implemented a stand-alone web application that is built around the 5 Safes [96]: CADRE (Coordinated Access to Data, Research and Environments) [97]. CADRE integrates ADA's Production Dataverse [10] for specific datasets, facilitating more efficient access requests and approvals/rejections. Requests for Production Dataverse [10] datasets managed through CADRE [97] are approved and revoked within CADRE [97], with granting/revoking access to and from users in ADA's Production Dataverse [10] occurring programmatically.

ADA will gradually manage all its dataset access requests through CADRE [97].

Version Control

The Harvard Dataverse Project team uses GitHub [54] for its version control system.  Any in-house tools or software that ADA produces internally are managed through GitHub.   

IT Service Management

Due to its simplicity when compared to the ITIL Service Management framework, the YASM Service Management framework has been initiated by the ADA Technical Manager.  As part of this YASM framework, a Service Portfolio has been created, to record and track ADA’s various internal and external services. 

At the end of every calendar month, the ADA DevOps role performs updates and maintenance tasks on all of ADA’s services, to keep code and applications up to date.  A simple Change Management approach is taken to inform the internal ADA team and external users of changes that will be implemented with upcoming Dataverse upgrades. ADA maintains Dataverse installations whose sole purpose is to test changes in new Dataverse releases before pushing the update to ADA’s three primary publicly consumed installations. ADA plans the release schedule of new Dataverse versions in step with the Dataverse Project’s releases. Not all available functionality is desired for ADA so any new functionality is tested and evaluated to determine when to enable it.    

NCI [7] performs an automated weekly set of security tests on ADA’s NCI-hosted services. A report with any identified issues is emailed to the ADA technical team and to the ADA Director. The ADA technical team carries out maintenance to address the report-identified issues. ADA also receives email alerts from the Australian Signals Directorate (Australia’s national cyber security and intelligence agency) [98] and/or the ANU Information Security Office (ISO) [99] bringing attention to any discovered problems with one or more ADA Services, with a request to address them. The ADA technical team works to address those problems and reports back to the alerting organisation.

Infrastructure Standards

ANU and NCI [7] have security standards in place to prevent ANU and NCI technical infrastructure from being adversely affected. NCI monitors the primary Dataverse installations with its f5 WAF [55] service.   

Availability, Bandwidth & Connectivity

ADA's web services including its Dataverse installations are available 24/7. Data access requests are prioritised and managed within the time capabilities of the ADA Access Management team. Messaging is posted to the ADA’s website and production Dataverse instances to alert users to future ADA shutdowns, to allow users to submit data access requests in a timely manner.  

NCI [7] manages network availability and bandwidth for ADA’s NCI-hosted services:

Network details (17/10/2025):
AVAILABILITY:
Network is at ~ 99.95%
BANDWIDTH:
Over 10Gbps available
CONNECTIVITY:
Redundant network connectivity to ADA services
DISASTER RECOVERY:
Active - Failover automatic network recovery
ADA's VM Datastore(s):
ADA's VMs are all hosted on high availability compute clusters that will restart any VMs on surviving compute hosts within their hosted data centre cluster. ADA's VMs can also be manually restarted in the alternate data centre if there is a whole compute cluster failure or whole data centre outage.
NAS File Server Volume(s):
ADA's NAS file server is hosted on high availability storage cluster that will failover any file server to surviving storage nodes within their hosted data centre cluster. ADA's file server can also be manually restarted in the alternate data centre if there is a whole storage cluster failure or whole data centre outage.

NCI also manages its f5 WAF that provides a level of protection for ADA’s Dataverse installations.

NCI manages SSL certificates for ADA’s web-based services and inform the ADA technical team when SSL certificates are about to expire, reissuing them for installing on ADA’s virtual machines (VM).

NCI alerts ADA about planned NCI infrastructure outages. The ADA team posts messaging on the Dataverse installations, and the ADA website, to alert users that the systems will be offline on the specified date(s) and time(s).

The ADA technical team has deployed monitoring software that detects when ADA’s VMs go offline, sending an email to the ADA Director, Technical Manager and DevOps roles. The ADA Devops (primarily) and Technical Manager (if necessary) work to get the systems back online, consulting with NCI if necessary.   

Disaster Recovery

NCI [7] manages snapshots of ADA’s data as follows:

ADA's VM Datastore(s):

  • Snapshot every 3 hours and retained for 24 hours.
  • Snapshot every 24 hours and retained for 14 days.
  • Mirrored to alternate data centre daily and retained for 24 hours.

NAS File Server Volume(s):

  • Snapshot every hour and retain for 24 hours.
  • Snapshot every day and retained for 7 days.
  • Snapshot every week and retained for 1 month.
  • Mirrored to alternate data centre daily and retained for 24 hours

Every 3 months ADA’s NCI mdss (mass data storage system) data is automatically tarballed and copied to the NCI mass storage / tape silo service.

The Dataverse databases are automatically backed up daily:

  • Locally to their respective VM - retained for 1 month.
  • to NCI external storage - retained for 6 months.


SSL Certificates and Domains

NCI is consulted on any technical issues related to SSL certificates and may inform the ADA team to consult ANU ITS. ANU ITS is also consulted on technical issues relating to ADA’s web service domains.  

Technical Change

The Dataverse GitHub repo [53] is monitored for new releases. The ADA Director and Technical Manager are also members of the Dataverse User Community [19] and are made aware of new releases via that group.  Any Dataverse bugs or new features needed by ADA are documented by the ADA Technical Manager on the Dataverse GitHub repo [53].

Any Dataverse feature requests for functionality deemed missing according to ADA requirements are created as an issue in the Dataverse GitHub [53] for consideration. 

References

[42] Open Archival Information System (OAIS) Reference Model – (https://public.ccsds.org/pubs/650x0m2.pdf)

[49] The Dataverse Project – (https://dataverse.org)

[82] Metabase - (https://www.metabase.com/)

[51] osTicket – (https://github.com/osTicket/osTicket)

[73] ADA CARAT tool – (https://github.com/ADA-ANU/ADA_Research_Data_Tools/tree/main/ADA_DRAT_v2)

[6] ADAPT – (https://docs.ada.edu.au/index.php/ADAPT)

[74] ADA Ingest Reporting Tool – (https://github.com/ADA-ANU/ADA_Research_Data_Tools/tree/main/ADA_reports)

[97] CADRE - (https://cadre.ada.edu.au)

[96] 5 Safes - (https://fivesafes.org/)

[10] ADA Production Dataverse - (https://dataverse.ada.edu.au/)

[7] National Computational Infrastructure – (https://nci.org.au/)

[53] Dataverse GitHub – (https://github.com/IQSS/dataverse/issues)

[19] Dataverse User Community – (https://groups.google.com/g/dataverse-community?pli=1)