Clinical SAS Interview Questions

What is the therapeutic area you worked earlier?Recently I am working on Oncology therapeutic area on different cancer types like 1. RENAL CELL CANCER (RCC) 2. METASTATIC COLORECTAL CANCER (MCC) 3. ADVANCED NON-SMALL CELL LUNG CANCER But I had worked earlier on some of the Neurosciences and Specialty care like Diabetes and Depression studies. What are your responsibilities? Some of them include; not necessarily all of them…. · Extracting the data from various internal and external database (Oracle clinical, CSV files, Excel spreadsheets) using SAS/ACCESS, SAS/INPUT, Proc Download · Creating and deriving the analysis datasets, listings and summary tables for different clinical trial protocols · Involved in mapping, pooling and analysis of clinical study data for safety · Using the Base SAS (MEANS, FREQ, UNIVARIATE, SUMMARY, TABULATE, REPORT etc) for summarization, Cross-Tabulations and statistical analysis purposes · Developing the Macros at various instances for automating listings and summary tables of multiple protocol having similar safety Tables /Listing of clinical data for analysis · Validating and QC of the efficacy and safety tables · Creating the Ad hoc reports using the SAS procedures and used ODS statements to generate different output formats like HTML, PDF · Creating the Statistical reports using Proc Report, Data _null_ and SAS Macro · Analyzing the data according to the Statistical Analysis Plan (SAP) · Generating the demographic tables, adverse events, labs, concomitant treatment/medication, Quality of Life (QoL) · Involved in Quality control and reporting of Data issues directly to data management team for the outliers, qualifiers and missing data. Can you tell me something about your last project study design? I recently worked on a protocol name A4061023 which is one of the potential compounds called Axitinib (AG13736) and this study is for Refractory Metastatic Renal Cell Cancer (RCC). It is a Phase II, Open Label, Non Randomised, Single Group study for the findings of Safety and Efficacy of this drug. This study has 62 subjects enrolled. The primary endpoint for this study is Objective Response Rate (ORR), CR, PR by RECIST (Response Evaluation Criteria In Solid Tumors) and Response rate of Axitinib in patients with RCC. Some of the secondary endpoints of this study involves measurement of PFS, DR, FKSI (Cancer Related Symptoms) QOL questionnaire , Safety Profile of AG-13736 Functional Assesement of Cancer Therapy (FACT) FKSI-15: is for patients with Kidney Cancer, 15 item scale Some of the inclusion Criteria of the study were 18 or older, any gender, 1. having RCC with metastases and nephrectomy 2. Failure of prior sorafinib based therapy Some of the exclusion criteria were 1. Gastronatial abnormalities 2. Active seizure disorder RCC is Kidney Neoplasms Carcinoma Renal Cell Cancer. of or relating to the kidneys What is the primary and secondary end point in your last project? such as censor variables pertaining to the time to an efficacy event. Age. Here. Bpbp How did you create analyzed data sets? Analysis datasets are used for the statistical analysis of the data. QS (Questionnaire). AE (Adverse Events). this analysis dataset will also centralize the laboratory data and standardize measurement units by using conversion factors. Some of the common AE’s are :a. One important thing to keep is mind while generating VAD is to make sure to exclude the variables which will never be used for any kind of analysis and are normally collected for the data collection facilitation. Can you explain something about the datasets? DEMOGRAPHIC analysis dataset contains all subjects’ demographic data (i. per visit. date of last collected Case Report Form (CRF) and duration on treatment. d.How many analyzed data sets did you create? Again it depends on the study the safety and efficacy parameters that are need to determined from the study. per lab result number. Final. The dataset has the format of one observation per subject. Date patient withdrew from the study). PE (Physical Examination). Progress/Worsening of underlying disease. 20-30 datasets is required for a study to get analyzed for the safety and efficacy parameters. this dataset can contain other efficacy parameters of interest. Approx. EX (Exposure). are used to produce the TLG’s of the clinical study. LB (Laboratory). per test. which are derived for the raw data. we derive the study visits according to the study window defined in the SAP. b. CO (Comments). EG (ECG). LABORATORY analysis dataset contains all subjects’ laboratory data. and Gender). ).. MH (Medical History). Primdiag.e.e. In addition. Abnormal Lab Test finding. This dataset has the format of one record per . IE (Inclusion and Exclusion). VS (Vital Signs). Changes in physical examination findings.. The safety as well as efficacy endpoints (parameters) dictate the type of the datasets are required by the clinical study for generating the statistical reports of the TLG’s. What do you mean by treatment emergent adverse events? A treatment-emergent adverse event is defined as any event not present prior to the initiation of the treatments or any event already present that worsens in either intensity or frequency following exposure to the treatments. DT (Death). CM (Concomitant Medication). disposition data (i. c. treatment groups and key dates such as date of first dose. DM (Demographics). This helps keeping the size of VAD to minimum and improves the performance as well. For ex. Variables. Clinical symptom and sign. Hypersensitivity e. Race. Prad. Pcancer. If the laboratory data are collected from multiple local lab centers. in the format of one observation per subject. EFFICACY analysis dataset contains derived primary and secondary endpoint variables as defined in the SAP. Analysis datasets contains the raw data and the variables derived from the raw datasets. Sometimes the analysis datasets will have the variables not necessarily required to generate the statistical reports but sometimes they may required generating the ad-hoc reports. listings and graphs for ISS and ISE? There are many reasons to integrate and to summarize all the data from a clinical trial program. Each clinical trial in the program is unique in its objective and design. Partial dates and missing AEs start and/or stop dates will be imputed using logic defined in the SAP. Some are small safety studies among normal volunteers. What do you mean when you say you created tables. as some variables derived from one particular analysis dataset may be used as the inputs to generate other variables in other analysis datasets. savings in time and money on data transfers among business. will be calculated. severe). CDISC standards is used in following activities: Developing CRTs for submitting them to FDA to get an NDA.subject per analysis period. For example. as well as a flag to indicate if an event is reported within 30 days after the subject permanently discontinued from the study. Data cleaning is critical for the data we are using and preparing. more efficient regulatory reviews of submission. What is your involvement while using CDISC standards? What is mean by CDISC where do you use it? CDISC is an organization (Clinical Data Interchange Standards Consortium). Mean. which implements industrial standards for the pharmaceutical industries to submit the clinical data to FDA. such as the elderly. How do you do data cleaning? It is always important to check the data we are using. Creating the annotated case report form (eCRF) using CDISC-SDTM mapping. Proc SQL. A treatment emergent flag. Cycles Dataset is a VAD and is generated based on number of days in a cycle as defined in Protocol but this AD is used in generating other VAD’s to facilitate study. ADVERSE EVENT analysis dataset contains all adverse events (AEs) reported including serious adverse events (SAEs) for all subjects. pooling and analysis of clinical study data for safety. while others are efficacy trials in a large patient population. There are so many advantages of using CDISC standards: Reduced time for regulatory submissions. I use Proc Freq. etc to clean the data.especially for the variables what we are using. . This dataset has a format of one record per subject per adverse event per start date. Mapping. The primary reason to create an integrated summary is to compare and to contrast all the various study results and to arrive at one consolidated review of the benefit/risk profile Also. pooling the data from various studies enables the examination of trends in rare subgroups of patients. It is crucial to generate analysis datasets in a specific order. those with differing disease states (mild vs. Proc compare and some utility functions like date. each dataset should include core demographics information such as age. superordinate descriptor for one or more HLTs HLT – Subordinate to HLGT. etc. superordinate descriptor for one or more PTs PT – Represents a single medical concept (Preferred Term) LLT – Lowest level of the terminology.Highest level of the terminology. related to a single. so individual files cannot exceed 25 MB. (i. . 52 countries submit medication data to the WHO Collaborating Center. proprietary/trade name. ethnicity. The dictionary contains information on both single and multiple ingredient medications.). To accommodate.Can you tell me CRT's?? These are Case Report Tabulations (CRTs) used for an NDA Electronic Submission to the FD.e. which is responsible for the maintenance and distribution of the drug dictionary. and distinguished by anatomical or physiological system. the Guidance explains. or purpose HLGT – Subordinate to SOC. Note that some FDA reviewers’ software require that files are first loaded into random access memory (RAM). however. now serving as the instrument for submitting datasets. At present. Datasets in SAS® transport file format 2. applicants may discuss what data to include as part of the CRTs with the FDA review division prior to the electronic submission. CRTs are made up of two parts: 1. Where do you use MedDra and WHO? Can you write a code? How do you use it? What is MedDRA? The Medical Dictionary for Regulatory Activities (MedDRA) has been developed as a pragmatic. etiology. nonproprietary name. and site location. CRTs are made up of datasets and the accompanying documentation for the datasets. programmers might need to break-up large datasets. chemical name. clinically validated medical terminology. sex. race. MedDRA is applicable to all phases of drug development and the health effects of devices. In addition to raw and derived variables. Drugs are classified according to the type of drug name being entered. one important part of the application package is the case report tabulations (CRTs). The Food and Drug Administration (FDA) now strongly encourages all new drug applications (NDAs) be submitted electronically. This helps reviewers track and analyze basic information quickly. MedDRA is used to report adverse event data from clinical trials. THE WHODRUG DICTIONARY: The WHODRUG dictionary was started in 1968. The accompanying documentation for the datasets. In practice. What are the structural elements of the terminology in MedDRA? The structural elements of the MedDRA terminology are as follows: SOC . The potential saving in reviewer time and cost is enormous while improving the quality of oversight. Electronic submissions could help FDA application reviewers scan documents efficiently and check analyses by manipulating the very datasets and code used to generate them. As described. In all subsequent processes and output. but if happens I will consider that patient in the group of the drug that he was given. . use a period to refer to missing numeric values. Did you see anywhere that.name then i=1. run. run. In DATA step programming. the value is defined as missing. if first. var date. I would consider the same patient as two different patients of each treatment group. if last. the value is represented as a period (if the variable is numeric-valued) or is left blank (if the variable is character-valued). array dates {3} date1-date3. Patient is randomized to one drug and the patient is given another drug? if you get in which population would you put that patient into? Although. set old. retain date1-date3. Proc transpose data=old out=new prefix=DATE. What would you do if you had to pool the data related to one parallel study and one cross over study? OR Say If you have a same subject in two groups taking two different drugs..What do you mean by used Macro facility to produce weekly and monthly reports? The SAS macro facility can do lot of things and especially it is used to… • reduce code repetition • increase control over program execution • minimize manual intervention • create modular code. and If you had to pool these two groups how would you do it? This situation arises when the study is a cross over design study. else i + 1. How do you deal with missing values? OR If some patient misses one lab how would you assign values for that missing values?? Can you write the code? Whenever SAS encounters an invalid or blank value in the file being read. How would you transpose dataset using data step? data new (keep=name date1-date3). Similar results can be achieved by Proc transpose. this situation is almost impossible. dates{i} = date. by name.name. by name. What are the stat procedures you used? FREQ. If I have a dataset with different subjid's and each subjid has many records? How can I obtain last but one record for each patient? Syntax: Proc sort data=old. Or proc sort data=old out=new nodupkey. Can you get some value of a data step variable to be used in any other program you do later in the same SAS session? How do you do that? Use a macro… with a %PUT statement. Run.subjid. or an underscore. MISSING a b . Set old. What would you do if you have to access previous records values in current record? By using lag function What is a p value? Why should u calculate that? What are the procedures you can use for that? If the p-value were greater than 0.. Did you ever create efficacy tables? Yes. MEANS. to recode missing values in the variable A to the value 99. use the following statement: IF a=. Run. If last. By subjid. the values 'a' and 'b' will be interpreted as special missing values for every numeric variable. Efficacy tables are developed to get an the information about primary objectives/parameters of the study. you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable.For example. SUMMARY etc Can you use all the functions in data step in macro definition? Yes. or that the group of independent . run. In the example below. I have created Efficacy tables. GLM. THEN a=99. by subjid. By subjid. The special missing values can be any of the 26 letters of the alphabet.05. Use the MISSING statement to define certain characters to represent special missing values for all numeric variables. Data new. . These are very are necessary in controlling and managing the macros. the stored macros in the macro library can be automatically called.. proc anova. Did you use ODS? Yes. which normally used to make the output from the Tables. PDF output is also needed when we send documents required to file an NDA to FDA. pdf and rtf formats. Startpage=now. Can you tell me something regarding the macro libraries? These are the libraries which stores all the macros required for developing TLG’s of the clinical trial. With the help of a %INCLUDE statement. by converting the output into HTML files. RTF. Your resume says you created HTML. Listings and graphs looks pretty. General syntax: Start the output with: ODS PDF file=”Abc. Min. Proc1 statements……………. But SQL procedure cannot calculate all the above statistics by default. proc GLM & Proc Ttest we cal calculate the p-value. we can generate the statistics like N. Using the Proc Freq. I have used the ODS(Output Delivery System). PDF? Why you had to create three?? Can you tell me in specific why each form is used? There are several ways of format to create the SAS output. when used together reliably predicts the dependent variable. What do you usually do with proc life test? Proc Lifetest is used to obtain Kaplan-Meier and life table survival estimates (and plots). If we need to send the printable reports through email. Proc2 statements……………. ODS creates the outputs in html.. Can you generate statistics using Proc SQl? Yes. as it is the case with PROC . STD & SUM using PROC SQL. Ods PDF close. SAS statements……………. To publish or to place the output on the Internet we need to create the output in HTML format. Mean. We generally create the SAS output in RTF.pdf” pdfpassword=(open=”open” owner=”owner”) pdfsecurity=None|Low|High startpage=never. and does not address the ability of any of the particular independent variables to predict the dependent variable. Data _null_ How do you use the macro which is created by some other people and which is in some other folder other than SAS? With SAS Autocall library using the SAS Autos system. Median. Which procedure do you usually use to create reports? Proc Report. Note that this is an overall significance test assessing whether the group of independent variables. Max. because the RTF can be opened in Word or other word processors.variables does not reliably predict the dependent variable. we need to create the output in PDF. – To convert time value c. How do you delete a macro variable? If the macro variable is stored in the library then it is easy to delete it. How to deal with missing values The protocol provides the overall function under which the study is to be conducted and includes the type of analysis planned. Installation directory of ver8 is V8 and in the same was it is V9 for version9. ANYDTDTEw. SAS 8 and 9 share the same components so be carefull while uninstalling if they are installed on the same machine. a. 3. ANYDTTMEw. Three new informats are available to convert various date. ANYDTDTMw. . What is SAP/ Protocol? Sap includes the technical and detailed elaboration of the principal features of the analysis described in the protocol. Ambigous values can be interpreted in a incorrect fashion. It explains detailed procedures for executing statistical analysis for primary and secondary endpoint. obtained from SQL and with the Data step. When do you prefer Proc SQl? Give me some situation? The SQL procedure supports almost all the functions available in the DATA step for the creation of data as well as the manipulation of the data. ODS Document Viewer is the new addition to SAS 9 for viewing the documents created with ODS statement. This is the only change in dataset structure. For example : administration schedule of drug and the study design What is the difference between Sas 8 and Sas 9 ? OR what are the changes in sas 8 and 9? 1. In version9 SAS supports formats and informats longer then 8 bytes. When we compare the same result. New length for a numeric format is 31 and character format is 31. – To convert date value b.MEANS. Multiple variables may be deleted by placing the variable names in the DELETE statement: Why do you have to use proc import and proc export wizards? Give me the situation? These two help us to transfer the files/data between SAS and external data sources. Variable derivation 2. – To conert datetime value. 2. more importantly it requires less time to execute the code. It is important to note that these informats make assumptions on a record basis. LOT 5. Planned interim analysis 4. V9 Dataset are backward compatible if they conform to V8 naming conventions. Note. Visit window definition 3. 4. time and datetime form of data into a sas dataset. SAP includes:1. PROC SQL requires less code and. Like sorting the data of Pros sort using four processor by dividing it in a equal chunk or doing a summarization using Proc means by deriving summary of the data and then adding the results to get final summary. YMD. animal and material resources to eliminate the unnecessary delay in the global development process whilst maintaining the safety quality and efficacy to protect public health.in Version 9 for all these procedures.One of the biggest enhancement in SAS9 is its ability to support multithreaded access to files for use in the data step and certain procedures. 8. It is not designed to remoe Proc sort. The objective of such harmonization is more effective use of human. The kind of data we are dealing with. PROC SUMMARY c. DateStyle System option is now available to set a default assumption for the date to be either DMY. PROC TABULATE f. PROC SORT b.. CPUCOUNT OPTION is to limit the number of CPU to use for assigning the multithreaded work.  extreme values expected  range checking  number of observations. 7. The default value is to use the MAXIMUM number of cpu’s available. Japan and the United States. PROC MEANS d. In SAS 9 Call SORTN/ SORTC are quick ways to sort VARIABLE values inside the data step. PROC REPORT e. How would you validate a TLG? OR How do you validate a program? Check for  Conditions expected from input data. 9.5. to find:  logical dead-ends  infinite loops  code never executed  algorithms / mathematical calculations . PROC SQL By default multi threading is on. 6. Dataset can be referenced by the real physical location and leading zeros are supported when using INTO clause. tracks number of observations / variables  handling of missing values  all pathways through the code. It is simpler way of ordering values of the same structure. Some of the Procedures which support multithreading are:a. Multithreaded Architecture:. Therefore there is a new option available for each procedure (THREADS/NOTHREADS) to optionally turn this feature off. MDY. What is ICH and its guidelines? The International Conference on Harmonization of Technical Requirements for registration of Pharmaceuticals for the Human Use is a unique project that brings together regulatory authorities of Europe. In this example first %put statement will output second and the next %put statement will output first as we are mentioning the scope of the variable while defining it. Format and Informats) What are the Data and Basic Syntax/Coding Errors?          errors or omissions in DATA step coding array subscript out of range uninitialization of variables invalid data hanging DOs or ENDs invalid numeric operations type conversions (automatic) warning and informational messages in log points to errors in:  DROP. %local abc. %macro test().     check for _ERROR_ flag in data PROC PRINT before and after DATA steps and then compare the results. %let abc=second. Data. The data step must end with a RUN . statement before you reference the variable in your code. The variable can not be referenced inside the creating data step. Not executing Macro after writing it. Miss spelling a keyword 2. Libraries / datasets spelled incorrectly. 4. KEEP  DELETE  BY  OUTPUT  MERGE (Repeat of By Values error)  subsetting Ifs What has been your most common programming mistake? 1. label. For %macroname to execute it. %let abc=first. Is is possible to Reference a GLOBAL and LOCAL macro variable separately having same name. . Call Symput SYMPUT is a routine that you CALL inside a data step to produce a global symbolic variable. The value of the macro variable is assigned during the execution of the data step. %global abc. missing semicolon at the end of a statement. use OBS= to limit number of obs printed use FIRSTOBS= to skip past a trouble spot use Proc compare to make sure the 100% accuracy between two datasets. (Length. 3. 1. It created .What is the benefit of using Proc Copy instead of just copying and pasting datasets? Proc copy procedure is used to copy the multiple datasets from one physical location to another and also to create the transport file of datasets for sending the datasets electronically.Despite every effort. It is the values of individual measures that are the "observations" in this case. Explain trade-offs. Using Macro Facility to use it at different program phases in the same session avoiding the writing of code again n again. 51. This is called the Last Observation Carried Forward (LOCF).2 schema. If using SAS9 making sure multithreading is being by the procedures which support it Using class statement in place of by where possible as it avoids the time sorting the datasets\ Q:. 3. 2.LOCF doesn't mean last SAS dataset observation carried forward.%put "inside macro &abc". PROC CDISC is a procedure that allows us to import (and export XML files that are compliant with the CDISC ODM version 1. %put "Outside macro &abc". patient data are not collected for some time points.the most recent previously available value is substituted for each missing visit. And if you have multiple variables containing these values then they will be carried forward independently. For example for summary statistics. 52) What is LOCF? Pharmaceutical companies conduct longitudinalstudies on human subjects that often span several months. What is PROC CDISC? It is new SAS procedure that is available as a hotfix for SAS 8. 4.3 version. It means last non-missing value carried forward. Eventually. /* Dataset names **/ Windows ------libname olddata 'c:\sas'. proc copy in=source out=xptout memtype=data. Name several ways to achieve efficiency in your program. It is unrealistic to expect patients to keep every scheduled visit over such a long period of time. * This is the location of the SAS data sets. .2 version and comes as a part withSAS 9. %mend test. %test.1. select bonus budget salary.xpt file which encoded to send them across libname source 'SAS-data-library-on-sending-host'. For reporting purposes. these become missing values in a SAS data set later. libname xptout xport 'filename-on-sending-host'. Dropping / Keeping only those variables which are of significance to the program. * This is what the transport file is libname olddata 'c:\'.rpt”. The NEW option causes any existing information in the file to be cleared. libname pear sasv5xpt 'c:\abc. run. Quit. PROC UNIVARIATE • calculates the median. * This is what the transport file is called. select chars cleaning. you can assign a fileref to the file using a FILENAME statement filename myoutput printer. proc copy in=olddata out=plum. Specify the external file or the fileref in the PROC PRINTTO statement. mode. called. run. . If you omit the NEW option from the PROC PRINTTO statement. run. range. you can route the SAS log or SAS procedure output to an external file or a fileref from any mode. or continuous variables. select cleaning. Select * into: n1 from datasetname. filename myoutput “c:\abc. * New SAS Data Sets will be written here.xpt'. How did you use Proc Printto? When you use the PRINTTO procedure with its LOG= and PRINT= options. Data Null Usage Proc Sql. and quantiles • • • calculates confidence limits tabulates extreme observations and extreme values generates frequency tables PROC GLM analyzes data within the framework of General linear models. The independent variables may be either classification variables. proc copy in=pear out=olddata.xpt'. Usage of PROC UNIARIATE? The UNIVARIATE procedure provides data summarization tools. There are two options Print and Log …Print will direct output of a procedure and log will direct the log output. proc printto print='/u/myid/output/prog1' new. PROC GLM handles models relating one or several continuous dependent variables to one or several independent variables. which divide the observations into discrete groups. the SAS log or procedure output is appended to the existing file. and provides information on the distribution of numeric variables. If you plan to specify the same destination several times in your SAS program. For example.libname plum xport 'c:\abc. %if &n1 <= 0 %then. Ans. Or checking against the demog data for male subject having value for child potential would be a data issue. Sometime the LOT of Mockup is not provided at that time we have to read the Protocol. Ans. The VAR statement is used to list the variables for which these statistics are desired. How do you setup a study or how do you start? Ans. Thus it is essentially equivalent to PROC MEANS using the NOPRINT option. Its important that the raw dataset is clean from data issues for the reports to be showing the different statistics of various data collections. How do you create Analysis Dataset / What is the process of Creating Analysis Dataset. Q. VAR income . this is to improve the performance of the dataset for further report generation. These datasets are generated by standard production code and then quality checked by independent programmers. For example. BY city . An OUTPUT statement builds a data set containing the specified statistics (SUMs and MEANs). but no printed output is generated. label and derivation. Further analysis dataset is quality checked by an independent programmer by deriving it independently by understanding the requirement and doing a Proc compare on the dataset to ensure the quality. Q. SAPin detail to find out what all TFL’s would be required for the submission of that particular study this is called requirement review and then this . If CRF mentions that this variable should have only 3 values then putting Proc freq and checking how many different unique values are in the data would be useful. ID area . Another way of checking data issues is to making sure that a particular data should be repeating in raw dataset or it needs to be unique for example screening records should only be one per subject but labs would have multiple entries for different visits of laboratory tests. but before starting to program the requirements are checked and approved by study manager. In the Quality check process of tables different permutation and combination on the type of data from a particular raw dataset is being used are put on. Fugures and Listings are decided and the LOT is prepared which helps divide the work between the team members and that’s how study setup starts. It starts with working closely with statisticians / study team for requirements and programming plan which includes the step by step detail of important endpoints (primary or secondary) which needs to be derived and how they will be derived keeping SAP and Protocol in mind. Raw dataset’s data is checked while performing the quality check of the tables which are based on that dataset. But sometime even on standard dataset a statisticians needs some specific endpoints to be derived for example Adjuvant and Neoadjuvent therapy classification along with demographics information then we have to program datasets and tables keeping the requirements in mind. Then the Tables. OUTPUT OUT=two SUM=t_income MEAN=m_income . PROC SUMMARY . Vital Sings and Concomitant Medication normally we don’t get specific requirement. length. PROC SUMMARY produces an output data set similar to PROC MEAMS. Study specific datasets like efficacy and others always have requirements mentioned including the details like variable name. Along with LOT we are provided with Mockups of the TFLs to facilitate how statistician want a TFL to be presented in what manner. After the VAD’s are ready then based on the LOT whatever is mentioned in it the TFL generation begins. Q. Data _Null_ Put “//Not data Matching the criteria” Run. So we have to understand the requirements and derivation. Q. How do you QC a Raw Dataset or how do you make sure the quality of raw data is good. Another important thing to keep in mind while generating analysis dataset is to drop the unused variables or data which does not belong to the real reason behind generating that dataset. For standard datasets like Demog. How do you set up TFL’s or generate TFL’s ? Ans. contains the summed totals for each loss variable. By acting on understanding the request clearly and doing programming and quality check on parallel basis. Sometimes a drug is administered against a 1. source data and other required mandatory information asked by client with proper formats and table structure mentioned in Mockups. race. The PDV contains two automatic variables _N_ and _ERROR_. 12. Q. When SAS processes a data step it has two phases. The PDV is the area of memory where SAS builds dataset. Comparision against mockup is done 2. Resulting report. is ordered as expected .alphabetically by STATE within COUNTRY within SEX. How do you deal with Adhoc Reports or what kind of adhoc reports you have worked on? Ans. Q. 3. CLASS SEX COUNTRY STATE. What is the Program Data Vector (PDV)? What are its functions? Function: To store the current obs. Compilation phase and execution phase. Comparator drug in the market 2. PDV (Program Data Vector) is a logical area in memory where SAS creates a dataset one observation at a time. Run. After input buffer is created the PDV is created. one observation at a time. 4. When 1. 6. tittle. 3. footnote. During the compilation phase the input buffer is created to hold a record from external file. 7. 8.requirement review is checked by statisticians or study team and then they give us green signal for the production of the TFL’s. All the TFL needs to be in co-ordination with each other as to mentioning right treatment arms and type of population which needs to be displayed. None of the requirements are missed None of the crucial data for the reports is missed out All the statistical information required is present in TFL’s It is present in the layout and order is has been requested to be in Log of the programs are clean with no errors or potential warnings QC for VAD and TFL’s has been finished Numbers are presented in appropriate manner and with specific units Q. gender or a table by each cycle. When a particular submission is ready or a deliverable is ready? Ans. So these kind of adhoc requests have to be dealt quickly and diligently. 5. Statistician will ask for a particular table which is already in original deliverable in a different group like by age. TFL’s needs to have proper number. 4.DATA NWAY. What do you understand different protocol of a same compound? Ans. VAR EXERLOSS WEITLOSS AEROLOSS. Compared against placebo Compared against different age groups Compared against doffrent dose level Syntax of PROC SUMMARY PROC SUMMARY DATA=TEST. OUTPUT OUT=TEST1 SUM=. . After tables are delivered sometime we receive adhoc request. Substr. a new iteration of the DATA step begins. PDV and Descriptor Information· Name statements that are recognized at compile time only? Drop. Eg. At compile time when a SAS data set is read. It is a Data counter variable in SAS. run. Length. Array By. if mod(_n_. Name some of the SAS functions you use on daily basis? Input. INPUT. Keep.14. Set old. Varnum Today() Day() Mean() . Lag. INFILE. Input Buffer. Where Name statements that are execution only. _ERROR_ variable has a value of 1 if there is a error in the data for that observation and 0 if it is not. Length Format. INPUT· Identify statements whose placement in the DATA step is critical. In the flow of DATA step processing. If we want to find every third record in a Dataset then we can use the _n_ as follows Data sasdatasetname. what is the first action in a typical DATA Step? The DATA step begins with a DATA statement. Sum. This is not necessarily equal to the observation number. Put. RUN. Rename Label. Informat Attrib. _N_ indicates the number of times SAS has looped through the data step. Retain. since a simple sub setting IF statement can change the relationship between Observation number and the number of iterations of the data step. and the _N_ automatic variable is incremented by 1.3)= 1 then. Each time the DATA statement executes. DATA. what items are created? Automatic variables are created.

Comments

Description